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To  Dale 


Now  it  is  fog,  I  walk 
Contained  within  my  coat; 

No  castle  more  cut  off 
By  reason  of  its  moat: 

Only  the  sentries  cough, 

The  mercenaries  talk. 

The  street  lamps,  visible. 
Drop  no  light  on  the  ground, 
But  press  beams  painfully 
In  a  yard  of  fog  around. 

I  am  condemned  to  be 
An  individual. 

In  the  established  border 
There  balances  a  mere 
Pinpoint  of  c o n c i ou s ne s s . 

I  stay,  or  start  from,  here: 
No  fog  makes  more  or  less 
The  neighboring  disorder. 

Particular,  I  must 
Find  out  the  limitation 
Of  mind  and  universe. 

To  pick  thought  and  sensation 
And  turn  to  my  own  use 
Disordered  hate  or  lust. 

I  seek,  to  break,  my  span. 

I  am  my  one  touchstone. 
This  is  a  test  more  hard 
Than  any  ever  known. 

And  thus  I  keep  my  guard 
On  that  which  makes  me  man. 

Much  is  unknowable. 

No  problem  shall  be  faced 
Until  the  problem  is; 

I,  born  to  fog,  to  waste, 
Walk  through  hypothesis, 

An  individual. 

Thom  Cunn 
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ABSTRACT 


Language  learning  is  an  example  of  a  task  that  is  not, 
in  general,  recursively  solvable,  but  for  which  non- 
algorithmic  solutions  exist  that  give  insight  into  many 
areas  of  theoretical,  and  potentially  practical,  interest.  A 
scenario  for  language  learning  is  proposed  that: 

^parallels  that  used  for  the  inductive  inference  of 
partial  recursive  functions, 

^permits  a  precise  description  of  the  relationship 
between  function  and  language  learning, 

^suggests  how  language  learning  might  be  rephrased  in  a 
fuzzy  cont  ext 

The  possibilities  for  function  and  (non-fuzzy)  language 
learning  are  surveyed.  Several  problems  commonly  confused 
with  language  learning  are  outlined  and  their  relationship 
clarified. 

Fuzzy  formal  languages  result  from  the  continued  quest 
to  make  formal  languages  somewhat  closer  to  natural 
language,  by  making  membership  in  a  formal  language 
gradable.  The  various  types  of  grammars  for  fuzzy  languages 
are  critically  analyzed.  Some  comments  are  made  on  how  the 
"generation  problem"  might  be  approached  in  the  future. 

The  previous  work  dealing  with  the  learning  of  fuzzy 
languages  is  critically  examined.  A  similar  solution 
involving  only  the  assignment  of  g r amma t ic a  1 i t i es  to  rules 
is  given.  It  is  noted  that  both  solutions  rest  upon  the 
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dubious  assumption  that  a  superset  of  some  set  of  correct 


rules  is  known.  A  new  outlook,  arising  from  the  unique 
presentation  employed  in  earlier  chapters,  is  suggested  and 
a  theorem  is  shown  that  establishes  the  equivalence  of 
certain  partial  recursive  functions  and  fuzzy  grammars.  This 
leads  to  a  general  method  of  solution  when  fuzzy  grammars 
are  used  to  name  the  target  language. 

Viewed  philosophically,  fuzzy  languages  seen  to  require 
a  more  approximate  criterion  for  learning  than  any 
previously  given,  one  that  permits  an  infinite,  yet  bounded, 
number  of  discrepancies  between  target  and  hypothesis. 
Previous  material  on  approximate  learning  is  surveyed.  A  new 
criterion  for  learning  implied  by  the  previous  work  for 
fuzzy  languages,  "order  matching",  is  defined  and  shown  to 
be  reducible  to  the  usual  concept  of  matching.  A  new  notion 
extending  the  previous  work  for  approximate  function 
learning,  "E-identification",  is  defined.  It  is  shown  that 
T-ident if iers  can  learn  very  large  classes  of  total 
recursive  functions,  and  are  more  powerful  on  the  total 
recursive  functions  than  almost  everywhere  identifiers.  E- 
identification  bounds  the  overall  proportion  of  differences 
between  the  target  and  hypothesis.  In  order  to  permit  the 
overall  proportion  of  differences  for  each  range  value  of 


the  target  to  be  bounded,  E 


range 


is  defined 


and,  for  finite  ranged  functions,  shown  to  entail  E- 


identification.  The  theorems  for  E- 


ion  are 


restated  for  E 


range 


-identification.  Finally,  the  equivalent 
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results  are  stated  for  the  languages  generated  by  fuzzy 
gramrca  rs . 
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Chapter  1 


INTPODUCTION 


The  true  guarantee  of  the  validity  of  induction  is  that 
it  is  a  method  of  reaching  conclusions  that,  if  it  he 
persisted  in  long  enough,  will  assuredly  correct  any 
error  concerning  future  experience  into  which  it  may 
temporarily  lead  us. 

C.  S.  Pierce 

A  method  of  solution  is  perfect  if  we  can  forsee  from 
the  start,  and  even  prove,  that  folio wing  that  method  we 
shall  attain  our  aim. 

Leibnitz 

The  solution  to  a  problem  changes  the  problem. 

Peer's  Law 


1.1  The  Problem,  Intuitively 

A  problem  that  has  received  considerable  attention'*' 
since  its  formulation  by  Chomsky  <Ch oms k y ,  1  9  5 7  ;  1  96  5>,  is, 

simply  stated,  that  of  discovering  a  name  for  a  formal 
language  L  cHopcroft  and  Ullnan,1969>,  given  only  a  finite 
sample  of  L  (and  perhaps  L  complement)  from  which  to  make 
the  inference.  This  is  commonly  known  as  "grammatical 
inference"  since  some  form  of  grammar,  often  a  Chomsky  Type 
grammar,  is  what  is  conventionally  meant  by  a  "name"  for  a 


For  the  two  major,  albeit  incomplete,  surveys  see 
<Biermann  and  Fe  ldma  n  ,  1  9  7  2>  and  <Fu  and.  B  o  o  t  h  ,  1  9  7  5a  >  .  The 
latter  stresses  "practical  solutions"  that  demonstrate  a 
tendency  to  confuse  this  with  "the  good  encoding  problem"  as 
discussed  later  in  this  chapter. 
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formal  language. 

The  general  problem  dealt  with  by  this  thesis  is  the 
reinterpretation  of  Chomsky's  original  task  within  a  fuzzy 
or  vague  context.  First,  a  statement  and  analysis  of 
language  learning  for  fuzzy  formal  languages'*'  is  given  along 
the  lines  Chomsky  proposed  for  (nonfuzzy)  formal  languages. 
Second,  a  new  approximate  notion  of  language  learning, 
consistent  with  the  spirit  of  fuzzy  languages,  is  analyzed. 

The  problem  of  fuzzy  language  learning  manifestly 
depends  upon  developments  in  both  the  theory  of  formal 
language  learning  and  the  concept  of  fuzziness.  The  former 
suggest  the  problem's  outline,  and  the  latter  its 
elaboration. 

1.2  Chomsky's  Problem  in  Context 

Although  the  problem  of  natural  language  learning  has  a 
very  long  history  <Choms ky ,  1  9  7  5>  ,  the  study  of  the  learning 
of  formal  languages  from  examples  and,  possibly, 
counterexamples,  of  course  arose  only  after  the  creation  of 
formal  languages.  These  reflect  not  only  a  relatively 
primitive  conception  of  language  as  simply  a  corpus  of 
utterances,  but  also  a  uniquely  Chomskian  outlook  that 
"language  shall  no  longer  be  regarded  as  a  corpus  of 
utterances  per  se,  but  rather  as  the  abstract  system  of 
rules  that  underlies  these  utterances"  <Ch oms k y , 1 9 5 7> • 
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i.e.  languages  for  which  membership  is  gradable 
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Formal  languages  incorporate  Chomsky's  view  that  natural 
language  possesses  a  non-trivial  structural  or  syntactic 
component  (i.e.  one  that  is  more  than  a  mere  list  built  up 
by  repetition  and  "analogy")  that  is  independent  of  any 
semantic  considerations. 

Our  models  of  natural  language  have  steadily  become 
more  semantically  based  and  complex  ^  ,  while  work  on  formal 

language  learning  has  continued  to  be  exclusively  syntactic, 
despite  the  occasional,  rather  dubious,  claim  to  the 
contrary  <Cr esp i-Regh iz zi ,  1 9 7  1>  .  Although  the  recent  work  in 
formal  semantics  <Stoy,1977>  may  eventually  provide  an 
opportunity  to  rectify  this,  the  divergence  has  meant  that 
the  relationship  between  natural  and  formal  language 
learning  studies  has  become  somewhat  tenuous.  A  subject  that 
often  appears  to  impinge  upon  both,  namely  the  so-called 
"computational  study  of  language  acquisition"  <Reek er , 1 9 7 6> , 
has  had  in  fact  no  particular  relevance  for  either.  The 
motivations  and  questions  in  these  fields  are  often  similar, 
as  a  comparison  of  the  studies  of  Gold  <  1 9  6  7  >  ,  Shrier  and 
Brown  <1978>,  Reeker  <1976>,  and  Dale  <  1  9  7  2>  shows 
particularly  well;  and  occasionally  even  the  resultant 
research  is  similar,  as  the  formal  studies  by  the 
psycholinguists  Hamburger  and  Wexler  <  1  9  7  3a ,  1  9  7 3b ,  1  9  7  5> 


See,  for  example,  <Fatz  and  Fodor,1963>  for  the 
desirability  of  a  semantic  emphasis  in  linguistics,  and 
<Charniak  and  Wilks,  1 9  7  6  >  for  some  developments  along  these 
lines  in  Artificial  Intelligence. 
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demonstrate.  The  distinguishing  factor  seems  to  be  the 
relative  importance  assigned  to  certain  features  of  natural 
language  versus  the  tractability  of  formal  language,  which 
determines  the  role  formal  languages  can  play  in 
understanding  natural  language  phenomena^  . 

It  is  the  author's  belief  that,  in  the  current  absence 
of  any  markedly  linguistic  constraints,  the  learning  of 
formal  languages  does  not  yet  provide  an  adequate  paradigm 
for  even  the  syntactic  component  of  natural  language 
learning.  It  appears  to  lack  many  of  the  features  that  serve 
to  distinguish  natural  language  learning  from  more  general 
inferencing,  for  example,  uniformity,  rapidity,  comparative 
intellectual  ease,  and  freedom  from  motivation  and  emotional 
state  <Chomsky,1965;  {filler,  1967  ;  Dale,  1972>.  However,  this 
picture  could  change  with  some  of  Angluin's  research 
nglu in ,  1  9  7 4  ;  preprint>  that  consciously  seeks  to 
incorporate  these  qualities  into  a  formal  language 
situation,  and  with  recent  experiments  <Peber,1977>  that 
suggest  the  differences  between  natural  and  artificial 
language  learning  situations  may  not  be  nearly  so  great  as 
previously  believed. 

The  well  known  equivalence  between  grammars,  machines, 
and  programs  or  partial  recursive  functions  <Popcr of  t  and 
Ullman, 1 9 69> ,  has  resulted  in  the  formulation  of  problems  in 

^  See  Levelt  <1974>  for  an  excellent  appraisal  of  this 
i ssue . 
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a  variety  of  different  terminologies  that  are  very  similar 
to  that  of  learning  a  formal  language.  Studies  on  the 
"identification  of  a  finite  state  machine  from  a  sample  of 
its  input/output  behavior",  "the  automatic  programming  of  a 
task  given  examples",  and  the  "inductive  inference  of  a 
partial  recursive  function  from  some  partial  enumeration" 
are  all  extremely  relevant  to  the  task  of  learning  a  formal 
language.  So  relevant,  indeed,  that  researchers  in  any  one 
of  these  areas  customarily  cite  results  deriving  from.  all. 

Solutions  to  the  fuzzy  language  learning  problem  hinge 
upon  an  explicit  unification  of  the  functional  and 
linguistic  approaches.  The  exact  relationship  between  these 
two  areas  is  not  immediately  apparent  and  indeed  has  been 
variously  interpreted  <Blum  and  Flun,1975;  Feldman  and 
Sh i e 1 ds , 1 9 7 7 ;  Uieh age n , 19 7 7> •  It  is  elaborated  upon 
considerably  in  Chapters  Two  and  Three. 

Caution  is  necessary  in  interpreting  the  results 
peculiar  to  the  various  notations,  since  each  can  encourage 
subtly  different  formulations,  the  differences  of  which  may 
not  be  immediately  apparent,  as  Cold's  <  19  6  7  >  "Black  Box 
Identification"  illustrates.  Fowever,  with  this  in  mind,  the 
problem  of  learning  a  formal  language  may  be  viewed  in  the 
light  of  any  of  the  developing  theories  of  inductive 
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inference  couched  in  "effective"  terms. *  So  it  is 
particularly  surprising  that  the  vast  philosophical 
literature  on  induction  <R a rk e r , 1 9 5 7>  seems  to  be  almost 
wholly  irrelevant,  of  use  only  in  pointing  out  the 
difficulties  that  must  beset  any  such  enterprise. 

Two  philosophical  conundrums  challenge  the  very 
possibility  of  a  solution.  C-oodman's  Paradox  states  that  for 
every  predicate  P  and  every  finite  set  0  of  objects,  there 
is  another  predicate  P*  equivalent  to  P  on  0  and  -P  on  0 
complement  <F.u  t  s  c  h  e  r  a  ,  1  9  7  3>  .  This  merely  formalizes  the 
commonplace  observation  that  an  infinite  language  cannot  be 
characterized  logically  by  any  finite  set  of  examples  and 
counter-examples.  Hume's  Paradox  then  asks:  If  in  inductive 
inference  the  hypothesis  or  theory  P  is  not  logically  or 
deductively  contained  within  the  given  data  C,  that  is  to 
say  that  in  some  worlds  C  does  not  arise  from  P  but  from 
some  different  premise  P* ,  then  what  is  there  to  guarantee 
that  we  are  in  a  world  where  the  inductive  inference  of  P, 
and  not  P*  say,  is  correct?  Nothing.  Consequently  the 
current  philosophical  consensus  appears  to  be  that  induction 
can  not  be  justified  deductively. 

Of  course  induction  does  appear  to  work,  and  so 


This  equation  of  language  learning  to  general  theory 
formation  is  dependent  upon  the  view,  realized  by  formal 
languages,  that  a  grammar  is  a  theory  for  a  language 
<Choms ky , 1 9 57> .  As  noted  previously  this  equation  may  now 
appear  somewhat  perverse  to  those  whose  concern  is  natural 
language  <Perwing,1973>. 
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philosophers  have  striven  to  provide  rational,  rather  than 
purely  logical,  justifications  for  it  <B 1 a ck  ,  1 9 7  Ob > . 
Depending  upon  the  reader's  tastes  these  may  or  may  not 
prove  satisfactory.  Of  more  significance  here  however,  are 
the  attempts  to  construct  an  "inductive  logic",  in  which 
rather  than  requiring  deductive  validity  of  an  inductive 
argument,  a  degree  of  probability  or  "confirmation"  is 
assigned  to  it.  That  is,  an  inductive  logic  would  provide  a 
mechanism  whereby  (possibly  conflicting)  hypotheses  could  be 
ranked  in  the  light  of  the  data  currently  available. 
According  to  Morgan  <1971a>,  philosophers  have  concentrated 
almost  exclusively  upon  this  Confirmatory  problem. 
Unfortunately  the  very  nature  of  "confirmation"  is  beset  by 
paradoxes  <S a lmo n , 1 9 7 3>  which  grow,  if  not  worse,  at  least 
more  explicit  in  the  usual  probabilistic  treatments  of 
induction  <C-ardi  ner ,  1  9  78>  . 

Two  of  the  main  reasons  for  the  curiously  diminished 
relevance  of  philosophical  studies  to  this  thesis  are 
apparent  from  the  previous  paragraph.  First,  the  issue  of 
confirmation  (as  distinct  from  validation)  can  wrongly  focus 
attention  upon  the  merits  of  particular  inductive  arguments: 
"Civen  the  data,  which  of  the  available,  deductively 
adequate  hypotheses  is  more  likely?"  Although  this  approach 
appears  in  some  (inconclusive)  attempts  at  constructive 
grammatical  inference  <e.g.Cook  and  Rosenfeld,1974>,  the  key 
to  the  logical  justification  of  induction,  and  the 
fundamental  outlook  of  the  material  covered  in  this  thesis, 
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is  that  it  is  inductive  strategies  which  must  be  evaluated: 
"Given  a  growing  set  of  data,  will  a  particular  strategy 
eventually  arrive  at  a  correct  hypothesis?"  Herein  is  the 
avenue  to  a  deductive  treatment  of  induction,  and  a  rigorous 
examination  of  its  potential.  And  although,  as  this 
chapter's  opening  quotations  indicate,  several  philosophers 
may  have  realized  this,  their  insight  remained  undeveloped. 

The  second  debilitating  feature  of  philosophical 
studies,  in  so  far  as  the  interests  of  this  thesis  are 
concerned,  also  stems  from  their  concentration  upon 
Confirmation.  The  Discovery  process  surely  is  of  fundamental 
significance  to  our  problem,  yet  it  has  usually  been  shelved 
pending  a  full  explication  of  Confirmation  <Mo rga n , 1 9 7  1  a >  . 

It  has  even  been  declared  extra-logical ^  and  best  studied  by 
analyzing  society,  history  or  the  creative  genius 
<Tou 1 mi n ,  1  9  7 3  ;  Fadama r d ,  1  9  5 4> .  This  deficiency  is  not  really 
remedied  here.  This  thesis  analyzes  only  one  particular 
creative  method,  that  of  enumeration  with  possible  tests 
(although  many  of  the  results  are  valid  for  all  possible, 
effective,  methods.)  The  failure  to  realize  that  these 
results  provide  few  if  any  suggestions  for  n o n -e nu me ra t ive 
learning  methods  is  responsible  for  the  voluminous,  but  to 
date  ineffectual,  studies  on  "constructive"  schemes,  about 

^  "There  are...  no  generally  applicable  'rules  of 
induction',  by  which  hypotheses  or  theories  can  be 
mechanically  derived  or  inferred  from  empirical  data.  The 
transition  from  data  to  theory  requires  creative 
imagination."  Hempel  cited  in  <Derwing , 1 9 7 3> 


1.2  Chomsky's  Problem  in  Context 


9 


which  more  is  said  towards  the  end  of  Chapter  Three.  So 
then,  although  new  methods  of  Discovery  would  be  extremely 
relevant  here,  the  philosophers  have  not  been  disposed  to 
look  for  them,  believing  with  Popper  that  there  is  no  "logic 
of  discovery"  to  discover. 

One  final  point  about  the  philosophical  studies  on 
induction  is  that  they  have  operated  outside  of  the 
computational  orientation  which  is  central  here.  This, 
together  with  the  other  factors  noted,  means  that  explicitly 
philosophical  material  is  of  only  peripheral  relevance.  Even 
Karl  Popper's  work  on  the  "hypothetico-deductive"  method 
< 1 9  6  P  >  fails  to  make  specific,  computational  suggestions 
other  than  the  basic  one  that  successful  strategies  should 
employ  a  "generate  and  test"  strategy  operating  upon  a  basis 
of  falsification  rather  than  confirmation.  In  fact  the 
influence  seems,  if  anything,  to  flow  in  the  opposite 
direction,  with  Case  and  Smith  < 1  9  7  8  >  and  Kugel  <  1  9  7  7  >  ,  to 
name  but  a  few,  explicitly  detailing  the  import  of  their 
work  for  the  philosophy  of  science.  There  appears  to  be 
particular  relevance  for  Chomsky's  LAD  < Ch oms k y , 1 9 7 5>  and 
the  rationalist-emp iricist  debate  <De rwing ,  1  9  7  3>  .  However, 
scientific  theories  are  judged  on  many  more  grounds  than 
generative  or  predictive  adequacy,  simplicity,  or  indeed  any 
of  the  criteria  of  the  formal  studies  discussed  in  this 
thesis  <Tou lmi n ,  1 9  6 3  ,  1  9  7 3>  . 

Even  computer  scientists  committed  to  the  creation  of  a 
logic  of  discovery  often  fail  to  come  to  grips  with  the 
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central  dilemma  in  the  learning  of  formal  languages.  Hen 
such  as  Hajek  <  1 9  7  5  >  ,  Heltzer  <  19  70>,  Morgan  < 1 9  7 1 > ,  and 
Plotkin  < 1 9  7 1 > ,  trying  to  design  explicit  "logics  of 
discovery",  generate  inductively  complete  sets  of  hypotheses 
for  given  finite  sets  of  data.  Trivial  and  contradictory 
hypotheses  abound  in  such  complete  sets.  Moreover,  for  any 
static  set  of  data,  such  as  the  above  researchers  are 
concerned  with,  the  philosophical  conundrums  mentioned 
earlier  ensure  the  impossibility  of  picking  out  the  correct 
hypothesis.  Consequently  their  work  will  be  of  relevance 
only  if  it  is  incorporated  into  some  strategy  dealing  with 
growing  data  sets. ^ 

There  are  several  other  problems  that,  although  often 
confused  with  the  topic  of  this  thesis  <e.g.  Gaines, 1977>, 
really  avoid  the  central  difficulty  of  the  language  learning 
problem.  Two  of  these  are  the  "good  encoding  problem",  and 
the  "finite  selection  problem".  The  good  encoding  problem 
may  be  stated  as:  How,  from  a  finite  sample  S  of  a  formal 
language  L  (and  perhaps  its  complement),  can  one  discover 
"good"  names  for  S  (Note:  for  S  not  for  L)  <Pa ley ,  1 9  7  7>  . 
Solutions  to  the  good  encoding  problem  may  be  thought  of  as 
logics  of  discovery  that  filter  hypotheses  by  a  type  of 
"confirmatory  measure"  for  which  intrinsic  properties  of  the 


A  suggestion  along  this  line  <Schubert , personal 
c ommu ni ca t io n >  that  appears  capable  of  speeding  up  and 
perhaps  making  more  practical  some  of  the  enumerative 
approaches  is  discussed  later. 
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hypothesis  (for  examp  1 e  simplicity)  rather  than  any 
relationship  to  L  are  what  matter.  The  language  learning  and 
the  good  encoding  problems  coincide  when  L  is  finite.  This, 
together  with  the  overlap  in  terminology  and  the  fact  that 
an  immediate  concern  with  good  encodings  can  go  hand  in  hand 
with  the  larger  problem  of  ultimately  acquiring  a  correct 
name  for  L,  makes  it  difficult  to  distinguish  which  problem 
is  addressed  by  some  authors.  This  is  particularly  true  of 
most  of  the  conceptual  or  structural  learning  programs  in 
Artificial  Intelligence  <e . g .  Winston,  1 9  7 0>  ,  but  occurs 
even  in  adequately  formalized  theories  such  as  the  General 
Systems  approach  to  the  identification  of  "generative 
structures  in  observational  data"  <Kli r , 1 9 76> ^  . 

Characteristic  of  such  work  is  that  it  is  "situation 
static",  and  is  only  shown  to  provide  subjectively 
reasonable  solutions. 

Take  the  language  learning  problem  and  modify  it  by  the 
provision  of  sufficient  (usually  a  priori)  additional 
information  to  enable  all  but  a  finite  number  of  hypotheses 
to  be  discarded  out  of  hand  and  one  has  the  finite  selection 
problem.  Finite  state  machine  identification  provides 
perhaps  the  prime  example  of  this  <Moore,  1  956  ;  C a i nes ,  1 9  7 5>  • 
Fere  the  usual  ploy  is  to  assume  know ledge  of  the  (maximum) 
number  of  states  and  input/output  symbols.  Conceptually  the 

<  Z  a  1  e  ck  a -Fe  lane  d  ,  1  9  7  7>  makes  the  point  that  these  theories 
are  really  concerned  with  what  is  called  later 
"identification  in  known  time". 
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problem  is  then  trivial.  Since  there  are  only  a  finite 
number  of  fsm's  satisfying  these  bounds,  the  target  machine 
can  be  identified  by  forming  their  direct  sum  and  conducting 
a  homing  experiment  <Koh avi , 1 9 7 8> .  Such  variations  can  lead 
to  a  host  of  very  practical  problems,  for  example  in  fsm 
fault  detection  <Koh avi , 1 9 7 8> . 

Finally,  there  is  a  curious  hybrid  between  the  language 
learning  and  finite  selection  problems.  This  is  obtained  by 
the  provision  to  the  language  learner  of  information 
additional  to  examples  and  counterexamples  from  the  target 
language,  yet  insufficient  necessarily  to  reduce  the  problem 
to  the  finite  selection  case.  The  provision  of  partial 
parsing  information  for  the  data  by  Crespi-Peghizzi  <19  7  1  > 
is  a  good  example  of  this.  More  often  the  studies  on  this 
problem  involve  examples  of  a  program's  input-output 
together  with  program  traces  <e .g.Barzdin  and  Freivald,1972; 
Eiermann  and  Fe ldma n ,  1  9  7  2a ;  Siklossy  and  Sykes,  19  7  5  > •  Since 
the  boundaries  are  not  sharp  between  the  categories  of 
language  learning,  good  encoding,  finite  selection,  or  this 
additional  information  situation,  where  a  particular  study 
should  be  placed  is  often  problematic. 

It  should  be  clear  that  a  solution  to  the  language 
learning  problem  depends  upon  a  very  strong  notion  of 
''pattern"  or  "structure",  one  involving  the  existence  of  a 
mechanism  for  its  generation.  As  the  search  for  regularity 
in  the  environment,  language  learning  studies  are  related  to 
the  countless  other  endeavours  in  this  direction: 
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statistical  analysis,  classical  pattern  recognition, 
information  theory,  and  so  on.  Yet  beyond  this  commonality 
of  interest  there  is  little  similarity  apparent  even  when, 
as  in  <Wa tanabe ,  1  9  69>  ,  the  subject  is  explicitly  that  of 
"i ndu  ct ion". 

However,  certain  deep  connections  are  appearing  that 

should  be  mentioned,  if  only  briefly.  First  of  all,  the  non- 

probabilistic  inference  outlined  by  this  thesis  is  the 

simplest  case  of  the  more  general  probabilistic  inference 

which  since  <So lo mo nof f , 1 9 6 4>  has  blossomed  into  a  far 

reaching  discipline  of  its  own^  .  This  in  turn  is  related  to 

classical  probability  theory  as  follows.  Classical  studies 

left  the  notion  of  "random"  (and  hence  of  "non-random"  or, 

intuitively,  patterned)  as  an  undefined  primitive,  a 

characteristic  of  a  process  rather  than  a  sequence.  Over  the 

last  decade  various  researchers  have  striven  to  give  a 

theory  of  randomness  in  terms  of  general  models  of 

computability,  and  thereby  erect  a  new,  constructive,  theory 

2 

of  probability  .  Developments  in  complexity  theory  have 
also  been  intimately  involved  with  this  <  S  ch  n  o  r  r  ,  1  9  7  3>  . 
Pecently  a  direct  link  has  been  shown  to  exist  between  the 
precise  notions  of  "random"  and  "predictable"  <Levin,1973; 
Schubert,  1  9  7  7  >  . 


1  See  <S o lomo n of f , 1 9 7 5>  for  some  recent  developments  and  a 
partial  overview. 

See  <Schube rt , 1 9 7 7>  for  some  recent  developments  and  an 
overview;  <Rumphr ey s , 1 9 7 7>  for  a  philosophical  discussion. 
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At  a  more  superficial  level  the  relationship  between 
the  language  learning  problem  and  these  other  theories  is 
one  of  mutual  borrowing.  For  instance,  both  classical  and 
Bayesian  statistical  methods  are  used  heavily  in  stochastic 
language  learning  <Fu  and  B o o t h ,  1  9  7  5b >  .  A  whole  new  field 
with  numerous  applications  <Fu,1977>,  Syntactic  Pattern 
Pecognition,  has  grown  out  of  classical  pattern  recognition 
due  to  the  influence  of  grammatical  inference  and  the 
invention  of  picture  grammars  <Fu,1974>. 

Although  still  primarily  of  only  theoretical  interest, 

as  both  the  highly  abstruse  mathematics  of  <L ind n e r , 1 9 7 4> 

and  the  practical  calculations  in  <Wha rt on , 1 9 7 7>  attest,  the 

study  of  language  learning  has  been  turned  to  several  quite 

practical  ends.  It  has  been  applied  to  the  inference  of 

biologically  relevant  L-systems^  ,  the  design  of  programming 

languages  <Cr es p i -P e gh iz z i , 1 9 7 3> ,  and  to  the  automatic 

construction  of  transition  network  grammars  <Chou  and 

Fu,  19  72>  popular  in  A. I.  studies  of  natural  language 

<Kaplan,197  2>.  Some  researchers  <Biermann  and  Smith,  1  Q  7  7  > 

2 

are  using  it  to  study  automatic  programming  and,  as 


See  <Ferman  and  Walker,  19  7  2>  for  the  first  treatment, 
albeit  one  focussing  more  on  the  good  encoding  problem.  <Coy 
and  P f lu g e r ,  1  9  7 9 >  gives  many  results  and  shows  their 
relationship  to  the  standard  language  and  function  learning 
^a  terial. 

However  the  more  customary  approach  is  to  attempt  the 
inference  of  a  program  from  some  semi-formal  description  in 
another  language.  Strictly  speaking,  this  is  more  of  a 
problem  in  translation  than  of  inductive  inference 
<Eiermann,  1  9  7 6 >  • 
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mentioned  previously,  pattern  recognition  <Evans,  1 9  7  1  >  • 

1.3  Fuzzy  Language  Learning  in  Context 

The  peculiarities  inherent  in  human  imprecision  h  ave 
long  been  known  to  the  philosophers  <Black,1970a>.  With 
Zadeh's  creation  of  "fuzzy"  set  theory  in  1965,  the  issue  of 
imprecision  could  be  analyzed  precisely.  Humans  use  such 
vague  concepts  as  "long",  "old",  "relevant",  and  so  on,  to 
great  advantage.  The  hope  is  that  "fuzziness"  models  this 
everyday  phenomenon  sufficiently  well  to  be  useful  even  if 
it  is  not  unchallengeable  <S  t  a  1  li  ngs  ,  1  9  7  7  >  . 

Although  there  has  been  some  effort  towards  the 
creation  of  a  fuzzy  deductive  logic  <Z a de h ,  1  9  7  7>  ,  there  have 
been  no  fuzzy  inductive  logics  developed. 

Learning  a  fuzzy  language  is  related  to  previous  work 
in  fuzziness  since  a  fuzzy  formal  language  is  defined  to  be 
a  fuzzy  set  of  sentences  constructed  from  some  finite 
vocabulary  <Lee  and  Zadeh,1969>,  and  so  the  learning  of 
fuzzy  languages  can  be  thought  of  as  what  Zadeh  termed  the 
problem  of  abstraction  <B  ell  man  et  al.,1969>.  This  can  be 
seen  as  an  attempt  to  make  the  inference  of  formal  languages 
closer  to  the  natural  language  situation  cTanura  and 
Ta nak a , 1 9 7 3> ,  as  a  theoretically  interesting  extension  of 
the  usual  studies  on  the  acquisition  of  formal  languages,  or 
even  as  possibly  leading  to  an  often  called  for  aid 
<Fu,1974>  to  those  engaged  in  fuzzy  syntactic  pattern 
recognition  <e . g .  Thomason,  19  73  ;  Kickert  and  Ko p p e la ,  1  9  76  ; 
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De  Palma  and  Yau,1975>. 

1.4  Formalisms  for  "Solvability" 

In  a  1962  paper  Shamir  remarked  informally  that  since 

it  is  possible  to  have  two  distinct  (infinite)  languages 

coinciding  upon  any  specified  finite  set  of  strings 

(Goodman's  paradox  revisited),  it  is  impossible  in  general 

to  discover  a  correct  grammar  from  only  a  finite  sample  of 

an  infinite  language^  .  Moore  <  19  5  6  >  proved  that  for  an 

arbitrary  finite  state  machine  M,  even  if  it  is  permissible 

to  specify  any  finite  input  sequence  S  for  M  to  respond  to, 

there  will  be  other  n on -e q u iva le nt  machines  that  have  the 

same  output  sequence  for  S  and  hence 

"it  will  never  be  possible  to  perform  experiments  on  a 
completely  unknown  machine  which  will  suffice  to 
identify  it  from  among  the  class  of  all  sequential 
ma  ch i nes .  " 

Uiehagen  < 1  9  7  8  >  characterized  the  classes  of  languages  for 
Tvhich  Chomsky's  problem  is  recursively  solvable,  showing 
them  to  be  relatively  trivial  (cf.  2.2.1). 

This  perhaps  explains  the  appeal  of  the  variants  of  the 
language  learning  problem  mentioned  earlier.  For  if  Church's 
Thesis  was  fully  endorsed  and  any  intuitively  "solvable" 
problem  was  required  to  be  solvable  in  the  usual  Turing 
Machine  sense,  then  the  quest  for  general  solutions  to  the 

*  Although  the  point  was  not  emphasized  before,  the  problem 
assumes  that  the  samples  are  not  of  some  special  sort  such 
as  the  "representative  samples"  of  Schubert  <  1 9  7 4b > . 
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language  learning  problem  would  be  futile.  To  paraphrase 
<Gold,1967>,  given  only  a  finite  amount  of  information  and 
no  a  priori  basis  for  choosing  among  logically  valid 
inferences,  a  learner  cannot  possibly  avoid  making  mistakes. 
Consequently,  all  that  a  learner  should  be  expected  to  do  is 
employ  a  sound  METHOD  of  making  inductive  inferences,  not 
always  to  make  the  particular  inference  that  is  correct. 

In  the  mid-sixties  another  conception  of  solvability 
arose  that  "overshoots  the  bounds  of  Church's  Thesis" 
<Crisculo  et  al.,1975>.  Putnam's  "trial  and  error 
predicates"  <  1 9  6 5>  and  Cold's  "limiting  recursion"  <19b5> 
mimic  the  activity  of  a  successful  scientist.  Unlike 
Turing's  calculator  of  arithmetic  sums  <Turing,  1950>  which 
rust  halt  and  announce  its  final  solution  if  it  is  to 
succeed,  the  scientist  is  not  expected  to  ever  cease  the 
calculation  of  new  and  better  theories  as  increasing  amounts 
of  data  become  available.  Crudely  put,  the  requirement  for 
success  is  only  that  the  sequence  of  hypotheses  be 
convergent,  somehow,  to  "The  Truth".  The  distinction  between 
these  two  kinds  of  solution  has  been  compared  to  that 
between  a  procedure  and  an  ongoing  process  <Cris  culo  et 
a  1 .  ,  1 9  7  5>  . 

Cold's  formulation  considers  a  Turing  Machine  T  to 
» 

successfully  compute  the  value  of  a  function  f  at  x  if  T 
gives  an  infinite  sequence  of  outputs,  only  finitely  many  of 
which  are  different  from  f  (x ) .  A  function  f  that  can  be  so 
computed,  by  one  Turing  Machine,  at  every  point  of  f 's 


. 
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domain,  is  called  "limiting  recursive".  More  rigorously, 
"limit"  is  taken  to  be  a  functional  operator  that  associates 
to  each  total  function  g  of  n+1  variables  a  partial  function 
f  of  n  variables  such  that: 

f  (x  .  .  .  x  )=lim  g(x,,...,x  ,m)  if  the  limit*  exists 
1  n  mto  l  n 

undefined  otherwise 

A  (partial)  function  is  said  to  be  a  (partial)  limiting 

r  e  cu  rs ive  function  if  it  is  expressible  as  the  limit  of  a 

total  recursive  function.  A  set  is  said  to  be  limiting 

recursively  enumerable  if  it  is  the  domain  of  a  partial 

limiting  recursive  function.  Terminology  and  results  for 

limiting  recursion,  akin  to  those  of  recursive  function 

theory,  can  be  developed  considerably  further  <Coetze  and 

F le 1 1 e ,  1 9  7  4  ;  Crisculo  et  al.,  1  9  7  5>. 

Limiting  recursion  is  more  powerful  than  normal 

recursion.  For  instance,  the  classic  "unsolvability"  result, 

2 

"The  Halting  Problem",  is  limiting  recursively  solvable. 

All  that  is  required  is  a  modified  Universal  Turing  Machine 
U  which  when  fed  a  T.M.  index  i  outputs  "no"  unless  and 
until  its  simulation  of  i  has  halted,  upon  which  it  outputs 
"yes"  thereafter.  Notice  that  the  strategy  S  of  taking  the 

*  This  is  the  usual  number  theoretic  "limit",  namely  lim 
f(x)=a  iff  f(x)=a  for  almost  all  xG  IT 

Actually  Putnam's  "2-trial  predicates"  suffice  to  solve 
membership  in  K  (i.e.  the  set  of  programs  that  halt  upon 
their  own  index).  These  are  weaker  than  limiting  recursive 
predicates  since  (3  k  :  P  is  a  k -trial  predicate)  iff  (P  £ 

T  *)  < Pu t nam ,  1 9  6 5>  .  A  commonly  known,  still  more 

"unsolvable"  problem  that  is  limiting  recursively  solvable 
is  the  "Busy  Eeaver"  problem  <Ausiello  and  P r ot a s i ,  1  9 7  5>  . 
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current  answer  provided  by  U  as  the  correct  one  guarantees 
that  only  a  finite  number  of  mistakes  will  be  made  before  S 
is  operating  upon  the  correct  assumption.  This  is  a  general 
feature  of  limiting  recursive  solutions.  A  feature  of  this 
solution  that  is  not  characteristic  of  limiting  recursive 
solutions  in  general  is  the  knowledge  that  if  a  "yes "  occurs 
all  further  computations  are  redundant  since  "yes"  must 
then,  by  construction  of  U,  be  the  correct  answer.  In 
general,  although  from  some  point  on  in  the  computation  of  a 
limiting  recursive  function  f  at  x  only  f (x )  is  returned,  it 
is  not  possible  to  determine  when  that  point  has  been 
reached.  In  short,  although  the  Turing  Machine  eventually 
"knows"  the  correct  answer,  it  may  never  be  able  to  know 
that  it  knows  and  so  halt. 

To  illustrate  the  scope  of  limiting  recursive 
techniques,  it  is  necessary  to  outline  something  called  the 
"Arithmetical  Hierarchy"  <Rogers ,  19 69>  •  This  serves  as  a 
standard  means  of  ordering  the  different  degrees  of 
recursive  unsolvability  .  An  n-ary  relation  R  is  in  the 
a  ri thme  t ica 1  h i e  ra  r  chy  iff  it  is  recursive  or  can  be 
expressed  as  {(x,,...x  )  :  (  0 , y ,),...(  C  y  )  S(x,,...x  , 

y  ,...y  )>  where  each  (K  is  either  V  or  and  is  over 
numeric  not  functional  coordinates  (however  the  x^  may  be 
function  indices),  and  S  is  an  (n+m ) -ary  recursive  relation. 
The  expression  within  the  brackets  is  called  a  predicate 
form  for  R.  .  It  can  be  shown  that  if  a  relation  P  can  be 


stated  within  quantificational  logic  using  recursive 
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relations,  then  it  is  in  the  arithmetical  hierarchy  (the 
converse  is  also  true,  trivially).  This  is  the  case  iff  E  is 
definable  within  elementary  arithmetic,  hence  the  name.  It 
is  a  curious  fact  that  it  is  the  minimum  number  of 
quantifier  alternations  (i.e.  number  of  adjacent  but  unlike 
quantifiers)  in  a  relation's  predicate  form(s)  that 
determines  its  maximal  degree  of  unsolvability.  is 

defined  to  be  the  class  of  all  relations  expressible  by 
predicate  forms  beginning  with  3  and  having  (n-1)  quantifier 
alternations.  "fTn  is  defined  exactly  as  except  that  the 

predicate  forms  must  begin  with  V  rather  than  3*  Often  a 
superscript  ^  is  added  to  these  symbols  in  order  to  indicate 
that  the  quantifications  are  over  numeric  rather  than 
functional  coordinates.  The  smallest  n  for  which  a  relation 
belongs  to  T  o  r  IT  indicates  the  relative  recursive 
unsolvability  of  the  relation,  with  higher  n  denoting 
"harder"  problems. 

This  solvability  hierarchy  is  a  complicated  affair,  but 
for  our  purposes  here  a  few  examp les  and  one  key  result  by 
Kleene  and  Post  should  suffice.  Zq  =  Tq  =  the  recursive 
sets.  is  the  class  of  recursively  enumerable  sets,  (i  : 

domain  of  the  ith  Turing  machine  is  finite}  is  in  >_^  and  {i 
:  the  domain  of  the  ith  Turing  machine  is  infinite}  is  in 
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"TC"  2  *  fact  any  sets  in  >.£  or  f!^  are  "Turing  reducible"^  to 

these  two  sets  respective ly.  A  particularization  of  the 
Kleene-Pos  t  Theorem  states  that  for  any  relation  R,  (R  £ 
flT?)  iff  (R  is  Turing  reducible  to  K).  Also,  P  -C  iff  R 
is  recursively  enumerable  in  K. 

The  power  of  limiting  recursion  can  now  be  sketched  in 
terms  of  the  Arithmetical  Hierarchy.  There  are  t  w  o  ^  main 
results  of  concern  here: 

*  A  predicate  R  is  limiting  recursive  iff  R  £  — 2  ^  T ^ 

<Gold,  1965;  Putnam,  1  9  6  5  >  •  So  the  question  of  whether  or 
not  R  is  true  at  a  given  point  is  limiting  recursively 
solvable  iff  R  is  Turing  reducible  to  F. 

*  A  predicate  R  is  limiting  recursively  enumerable  iff  R 

€  >_2  <Crisculo  et  al.,  19  75>.  That  is,  the  points  for 

which  P„  is  true  can  be  effectively  enumerated  given  only 
some  (arbitrary)  enumeration  of  K. 


Intuitively  speaking,  a  set  A  is  Turing  reducible  to  a  set 
E  if  the  the  provision  of  an  oracle  to  decide  questions  of 
membership  for  set  B  allows  the  resolution  by  a  Turing 
machine  of  membership  questions  for  set  A.  A  very  similar 
notion  is  that  of  a  set  A  being  recursively  enumerable  in  a 
set  B.  This  means  that  there  is  a  Turing  machine  that,  given 
any  enumeration  of  B,  can  then  enumerate  A.  This  is  a 
slightly  weaker  notion  than  Turing  r  e  du  c  i  b  i  1  i  ty  . 

^  See  <  Je  r  os  low  ,  1 9  7  5>  for  an  analysis  of  the  scope  of 
limiting  recursive  methods  in  terms  of  the  more  usual 
logical  notions  of  "consistency"  and  "completeness". 


j  . 
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1.5  More  About  Formalisms  for  "Solvability" 

Very  many  of  the  investigations  of  language  learning, 
from  Gold's  initial  demonstration  in  1967  that  it  was 
possible,  through  to  Uiehagen's  elaborate  complexity  and 
numbering  theoretic  characterizations  in  1978,  depend  upon  a 
limiting  recursive  functional  to  effect  their  solution.  The 
investigation  of  any  problem  is  inexorably  determined  by 
what  is  deemed  to  constitute  an  acceptable  type  of  solution. 
So  when  a  recursive  solution  is  sought,  Chomsky's  problem  is 
all  but  impossible,  whereas  when  a  limiting  recursive 
solution  is  sought,  it  is  solvable  for  distinctly  non¬ 
trivial  classes  of  languages.  Much  of  this  thesis  is  devoted 
to  the  explication  of  these  words  and  their  realization  in  a 
fuzzy  cont  ext  . 

Notions  of  "solution"  other  than  "recursive"  and 
"limiting  recursive"  are  used  occasionally  in  language 
learning  studies.  The  two  most  frequently  occurring  ones  are 
generalizations  of  limiting  recursion. 

Schubert  < 1 9 7 4a >  defined  a  (partial)  k -limiting 
r  e  cu  r s  ive  f  un  ct ion  by  applying  the  limit  operator  k  times 
(assuming  the  intermediate  functions  are  total)  to  a  total 
recursive  function.  Of  course  the  1-limiting  recursive 
functions  are  just  the  limiting  recursive  functions.  But  the 
limit  operator  is  enormously  powerful  -  the  entire 
Arithmetical  Hierarchy  can  be  characterized  by  repeated 
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applications  of  it^  < S ch ub e r t , 1 9 7 4a ;  Crisculo  et  al.,1975>. 
And  for  low  values  of  k  there  are  intuitive  interpretations 
that  still  appear  to  preserve  a  degree  of  ''effectiveness" 
<5chubert,  1974a>.  For  example,  2-limiting  recursion  may  be 
modelled  as  an  expanding  community  of  processes  of  which 
only  finitely  many  never  settle  upon  the  correct  answer. 

Probabilistic  limiting  recursion  is  the  second  major 
generalization  of  limiting  recursion.  If  defined  carefully, 
this  is  also  a  very  powerful  approach.  For  example  a 
function  f  is  "weak  computable  in  the  limit  with 
p r oba h i 1 i ty > 0"  iff  f  G  <F r e iva 1 d , 1 9 7 4> ,  where  a  function 

is  defined  to  be  weak  computable  in  the  limit  with 
probability  >  p  if  there  exists  a  Turing  machine  T  with 
access  to  a  Bernoulli  generator  (p=l/2)  such  that: 

a.  if  f (x )  is  defined  then  the  probability  of  printing 
an  infinite  output  sequence  with  limit  f  (x )  is  >  p 
h.  if  y  4  f(x)  the  probability  of  printing  an  infinite 
output  sequence  with  limit  y  is  <_  p  . 

The  need  to  draw  the  limit  somewhere,  together  with  the 
fact  that  one  model  has  predominated  to  this  date  in  the 
language  learning  problem  (and  the  related  problems  in  the 
other  terminologies),  means  that  this  thesis  deals  almost 
exclusively  with  answers  based  upon  the  limiting  recursive 
paradigm,  and  mentions  the  other  generalizations  only 


Ve  ry 


briefly:  A  set  R  has  a 
function  iff  F. 


k-limiting  recursive 

C  ,  ,  n  Ti  ■  1  •  A  set 


limiting  recursively  enumerable 
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k.+ 1  J 
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occasionally  to  give  some  idea  of  the  relationships. 

Chapter  Two  outlines  the  topic  of  function  learning, 
providing  the  framework  for  the  third  chapter  and  detailing 
the  major  theorems  that  constrain  solutions  to  the  formal 
language  learning  problem.  Chapter  Three  describes  at  length 
the  relationship  between  function  and  language  learning 
studies  and  discusses  certain  features  more  characteristic 
of  the  latter  area.  There  is  a  dual  emp basis  in  Chapters  Two 
and  Three,  n ame ly  the  provision  of  a  general  basis  for 
comprehension,  comparison,  and  reformulation  with  respect  to 
fuzzy  language  learning,  and  the  description  of  the  specific 
results  relevant  to  approximate  learning.  Chapter  Four 
introduces  the  notion  of  "fuzziness",  and  analyzes  the 
various  suggestions  for  naming  fuzzy  languages.  And  Chapter 
Five  shows  how  the  previous  learning  material  can  be 
rephrased  in  a  fuzzy  context. 


■ 


Ch  ap  t  e  r  2 


LEARNING  FUNCTIONS 


2.1  Introduction 

Perhaps  the  most  revealing  view  of  Chomsky's  problem 
stems  from  the  realization  that  learning  a  formal  language 
can  be  understood  as  the  learning  of  either  the  language's 
characteristic  or  s emi -ch a ra c t e r i s t i c  function.  This  quite 
properly  suggests  that  the  greater  number  of  results  dealing 
with  function  learning  be  considered  in  any  investigation  of 
language  learning.  In  fact  the  inescapable  question  is  why 
there  are  two  areas  at  all;  why  is  there  not  a  single 
unified  development?  To  quote  <Feldman  and  Sh ie  Ids ,  1 9  7  7>  : 
"there  has  been  surprisingly  little  carryover  from  the  one 
domain  to  the  other  ...[although]  a  common  understanding  of 
the  issues  seems  to  be  emerging". 

A  dual  presentation  is  maintained  in  this  thesis  for  a 
number  of  reasons.  Foremost  is  the  fact  that  as  informal 
terms  such  as  "learning"  are  exchanged  for  their  precise 
counterparts  certain  differences  appear  in  what  researchers 
in  the  two  fields  have  been  trying  to  do.  For  example,  the 
acceptance  of  extensions  to  partial  functions  has  been 
standard  in  the  function  learning  problem  whereas  it  is  not 
usually  acceptable  when  considering  a  function  as  the  semi- 
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characteristic  function  for  some  language.  Furthermore, 
while  the  function  enumerated  and  the  target  function  are 
synonymous  in  functional  studies,  this  is  not  always  the 
case  in  the  linguistic  research.  These  potential  differences 
are  expanded  upon  in  Chapter  Three.  The  second  reason  for 
retaining  the  f unct ion/language  dichotomy  is  that  functional 
and  linguistic  terminologies  encourage  distinctive  habits  of 
thought  and,  by  rendering  certain  questions,  restrictions 
and  modifications  more  natural,  encourage  the  pursuit  of 
different  hinds  of  results.  For  example,  the  largely 
linguistically  motivated  distinction  between  examples  and 
counter-examples  plays  an  important  role  in  the  language 
learning  results,  but  only  rarely  is  the  corresponding 
functional  version  mentioned. 

For  these  reasons  then,  this  chapter  presents  a 
separate  outline  of  function  learning. 

The  terminology  and  ideas  dealing  with  function 
learning  have  the  advantage  of  clarity  and  relative 
simplicity  over  those  dealing  specifically  with  language 
learning.  So  much  so,  in  the  author's  opinion,  that  this 
thesis  attempts  to  maintain  the  style  of  the  functional 
material  through  into  the  linguistic  studies.  To  avoid  the 
confusion  so  easily  engendered  by  premature  generality,  this 
chapter  proceeds  by  adding  to  or  modifying  a  basic  model.  In 
contrast  to  this,  the  next  chapter  starts  with  a  general 
framework  that,  given  the  basic  understanding  developed 
here,  should  broaden  the  perspective. 


' 
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2.2  The  Basic  Models 


2.2.1  Identification  in  the  Limit 

A  few  definitions  are  required  to  begin  with.  In 

general  the  notation  of  <Hopcroft  and  U 1 Ina n , 1 9 69 >  and 

<Po ge r s , 1 9 6 7>  is  used  wherever  appropriate.  Functions  are 

usually  assumed  to  be  mappings  from  N  to  N  (N  is  sometimes 

identified  with  \1 )  ,  and  a  standard  indexing  of  the  partial 

recursive  functions  is  assumed  throughout,  t  ^  stands  for  the 

ith  partial  recursive  function,  and  stands  for  a 

computational  complexity  measure  for  t  ^  of  the  type  analyzed 

in  the  survey  article  of  Hartmanis  and  Hopcrof  t  <  1 9  7 1  >  •  R  is 

the  class  of  total  recursive  functions.  P  is  the  class  of 

partial  recursive  functions. 

D  ef  ini  t  ion :  An  (arbitrary)  enumeration  f  of  a  partial 

recursive  function  f,  is  an  infinite  sequence  of  the 

form  (v1 ,v2 ,v3  ,  .  .  .  )  where  either  v^  =  *  or  v.=(x.  ,f  (x.  )) 

with  x  ^  G  Pomain(f)  and  every  x  G  Domain(f)  appearing 

in  some  v  .  . 

3 

Definition;  f = (v ^ , v 0 , v ^ , .  .  . )  is  a  p  r i mi t ive  recursive 
[effective]  enumeration  of  a  partial  recursive  function 

/s 

f  if  f  i s  an  enumeration  of  f  and  3  a  primitive 

recursive  [recursive]  function  p:N  ->  (FxN)  U  {*}  such 

that  p(n)=v  . 

n 

D  p  f  i  n  i  ri  nn  ;  ^  =  (v  pV2  >v  ,  •  •  •  )  is  an  i.ncr  easing 

[methodical]  [request]  enumeration  of  a  partial 


. 
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recursive  function  f  if  f  is  an  enumeration  of  f  and 

Vn  =  *  °r  ^Vn=*  ^  ^  f  is  undefined  at  z  (n  )  and 

=  (  z  (n  )  ,  f  (z  (n  )  )  )  otherwise,  where  z  €  P.  is  prespecified] 

[to  determine  v  ,  the  inductive  inference  machine 

specifies  x^,  and  v^  is  subsequently  either  *  (if  f  is 

undefined  at  x  )  or  (x  ,  f  (x  ))] 

n  n  n 

Definition:  A  partial  e  nu  me  ra  t ion  f^  of  a  function  f,  is 
the  finite  sequence  consisting  of  the  first  n  elements 
of  an  enumeration  of  f. 

Definition:  A  Codelization  [f  ],  of  a  partial 

-  n 

enumeration  f^,  is  the  natural  number  supplied  by  some 
1-1  recursive  mapping,  from  partial  enumerations  to  N, 
operating  upon  f^. 

De  f  ini t ion :  A  n  induct ive  inference  machine  is  a  total 
Turing  Machine  whose  inputs  and  outputs  (for  the  first 
two  models)  are  to  be  interpreted  as  Codelized  partial 
enumerations  and  Turing  machine  indices  respectively. 
Definition:  An  inductive  inference  machine  M  c  o  nve  r  ge  s 
to  i  for  an  enumeration  f,  if  the  sequence  M  (  [  f  ^  ]  )  , 

M  (  [  f  2 ]  ) ,  M  (  [  f  ^ ]  )  ,  •••  has  limit  i. 

Definition:  An  inductive  inference  machine  M  identifies 
a  function  f  in  the  limit  ^  if  for  every  enumeration  of 
f  3  i  such  that  M  converges  to  i,  and  i  is  an  index  for 
a  program  that  computes  some  extension  of  f. 

*  The  "in  the  limit"  qualification  is  often  omitted  for 
c  onve  ni e  n ce . 
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Def  inition  :  An  inductive  inference  machine  11  ident  if  ies 
a  class  C  of  functions  in  the  limit,  if  f  G  C  implies 
that  M  identifies  f.  A  class  C  of  functions  is 
identifiable  if  3  an  inductive  inference  machine  that 
identifies  C. 

Def inition :  Program  i  is  c  omp  a t ib le  with  a  partial 

^  ^  1. 
enumeration  f  if  t.  includes  f  . 

n  i  n 

Def  ini t ion :  The  identifying  power  of  an  inductive 
inference  machine  M,  is  the  largest  class  of  partial 

2 

recursive  functions  that  M  can  identify  in  the  limit. 

Def inition:  ID  is  the  class  of  sets  of  total  recursive 
functions  that  are  identifiable. 

Definition;  A  Popperian  machine  is  an  inductive 
inference  machine  that  outputs  only  indices  of  total 
recursive  functions. 

Definition:  A  finite  funct ion  f  is  a  function  such  that 
Domain(f)  is  finite. 

Definition:  A  total  recursive  function  f  is  h-e  as  v  iff  3 
t_^  =  f  such  that  T^(x)  <_  h(x)  for  almost  all  x  G  N,  where 

hG  R  . 

Definition:  A  partial  recursive  function  is  h-hones  t  if 


Note:  For  notational  convenience,  a  sequence,  that  is  a 
function  whose  domain  is  N,  is  sometimes  spoken  of  as  if  it 
were  its  range.  For  example  the  sequence  (1,2,3,...)  is 
spoken  of  as  if  it  were  {1,2,3,...}  when  terms  such  as 
inclusion  or  containment  are  used. 

2  A  model  of  induction  is  loosely  described  as  "more 
powerful"  than  another  if  its  power  is  larger  than  or 
contains  the  other's.  This  apparently  is  contrary  to 
standard  usage  in  linguistics. 
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3  an  extension  t  of  f  such  that  T(x)  £  h(x,t(x))  for 
almost  all  x  6  Domain(f),  h0  R^ . 

Definition:  f  €  P.  is  eve  ry  wh  ere  O-compressed  for  some 
general  recursive  operator  ^  0,  if  3  a  program  i 
computing  f  such  that  for  any  other  program  j  for  f  and 
Vx  G  Domain(f),  T  .  (x)  £  0  (  T  )  (ma  x  (i  ,  j  ,  x )  )  .  That  is, 
"modulo  0",  t is  the  fastest  program  for  f. 

The  problem  of  demonstrating  the  existence  or 
construction  of  an  inductive  inference  machine  that 
identifies  a  given  class  of  functions  in  the  limit  is  the 
main  formalization  of  the  intuitive  goal  of  learning  a 
function  from  a  finite  set  of  input-output  tuples.  Before 
going  on,  it  should  be  emphasized  once  more  that  this  goal 
can  be  made  rigorous  in  a  variety  of  more  or  less  plausible 
ways.  Two  other  models,  "matching"  and  "extrapolation",  are 
discussed  in  the  next  two  subsections  since  they  fit  into 
very  much  the  same  framework.  However  there  are  still  other 
models  dependent  upon  rather  different  frameworks  that  are 
omitted.  Most  significantly,  the  desirable  addition  of 
probabilistic  considerations  is  not  treated  here.  Thus  the 
learning  of  stochastic  languages,  as  in  <Po r n i ng ,  1 9  6 9  ,  1  9  7 2>  , 
<  Co  ok  and  Ros enf eld ,  1 9 7 4> ,  <Booth  and  Ma ry ansk i , 1 Q 7 7>  ,  <Liou 


1  See  <Poge rs , 1 9 67> .  Loosely  speaking,  a  general  recursive 
operator  0  is  a  mapping  from  P  to  P  such  that:  R  C 
Domain(O) ,  0  maps  R  to  R,  and  0  is  an  "enumeration 

operator".  An  enumeration  operator  is  a  mapping  from  sets  to 
sets  that  formalizes  the  notion  of  enumeration  re du ci bi li ty . 


. 
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and  Dubes, 1977>,  < Sh r i e r , 1 9 7 7> ,  or  <Van  der  Mude,197P>  for 

example,  is  not  considered  (see  <Fu  and  Booth,  1  9  7  5b  >  for  a 

good  survey);  nor  is  the  effect  of  probabilistic  inductive 

inference  machines  <  P  o  dn  i  ek  s  ,  1  9  7  5>  . 

The  subtle  nature  of  identification  is  not  immediately 

apparent.  Since  a  machine  M  that  identifies  a  function  f  has 

converged  upon  a  correct  name  after  seeing  only  finitely 

many  input-output  tuples,  identification  satisfies  the 

requirements  of  the  informal  problem  statement.  Yet  in 

general  there  is  no  effective  method  for  judging  when 

sufficient  input-output  pairs  have  been  input  to  M  for  M  to 

have  ceased  giving  incorrect  outputs.  It  is  this  which 

permits  identification  to  escape  the  trivial  confines  of  the 

earlier  recursive  interpretations  of  the  problem.  Setting  an 

a  priori  bound  on  the  number  of  distinct  input-output  pairs 

input  before  M  outputs  a  correct  index, ^  or  requiring  II  to 

indicate  in  the  course  of  its  calculations  when  this  point 

2 

has  been  reached  only  reintroduce  the  problems  discussed  in 
Chapter  One. 

THEOREM  <\1  i  eh  a  ge  n ,  1  9  7  8>  A  class  C  of  total  recursive 
functions  is  identifiable  in  known  time  iff  C  C  some 
recursively  enumerable  class  of  partial  recursive  functions 

J  This  is  known  as  identif icat ion  i n  f ixe  d  time 

This  is  usually  known  as  "identification  in  finite  time". 
However,  since  identification  takes  place  in  finite  time 
even  for  identification  in  the  limit,  this  is  more 
accurately  described  here  as  identification  i n  known  time. 
"Time"  here  refers  to  the  partial  enumeration  numbers. 
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such  that  the  ith  and  jth  functions  (for  i  ^  j)  differ  from 
one  another  for  some  argument  _<  r(i)  where  r  6  R. 

There  are  a  number  of  variations  of  the  definitions 

given  for  which  the  possibilities  of  identification  in  the 

limit  remain  unaltered,  that  is  solutions  using  the 

definitions  given  can  be  effectively  translated  into 

solutions  involving  the  following  modifications  (and  vice 

versa). ^  Inductive  inference  machines  may  be  taken  to  be 

primitive  recursive  <Earzdin  and  Fr ei va Id , 1 9 7 2>  or  partial 
2 

recursive  <Mi ni coz zi ,  1 9  7 6>  functions.  For  total  functions 
the  requirement  of  convergence  by  arbitrary  enumerations  is 
equivalent  to  requiring  convergence  by  increasing  <Blum  and 
Elum,1975>,  request  <Cold,1967>,  methodical  <Gold,1967>  and 
effective  <Blum  and  Elum,lr>75>  enumerations.  Although  the 
definitions  given  do  not  require  that  in  order  to  identify  a 
function  an  inductive  inference  machine  M  must  converge  to 
the  SANE  index  i  for  f  regardless  of  the  particular 
enumeration  f  input  to  M,  this  can  be  required  without 
altering  the  results  <Blum  and  Blun,1975>.  Finally,  the  same 
results  hold  even  if  an  inductive  inference  machine  is 
permitted  to  output  a  correct  index  only  once  for  a  given 
function  while  varying  the  remainder  of  its  hypotheses 


1  However  these  modifications  may,  for  example,  alter  an 
analysis  of  the  solution  in  terms  of  the  complexity. 

"  in  this  case  the  requirement  is  that  at  least  one  output 
be  made,  and  that  there  is  an  algorithm  index  i  for  the 
function,  such  that  there  is  some  point  in  every  enumeration 
of  the  function  past  which  the  last  output  is  i. 


. 
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between  only  finitely  many  alternatives  <Case  and 
Smith,  1  9  7  8  >  . 

As  stated,  the  problem  of  identification  in  the  limit 
is  enumeration  independent.  Certainly  inductive  inference 
machines  must  he  required  to  work  for  any  of  some  fairly 
general  class  of  enumerations,  in  order  to  avoid  the 
theory's  t r ivi a 1 i z a t i on  through  certain  trick  classes  of 
enumerations  that  give  away  the  answer,  such  as  those  for 
which  the  x -value  of  the  first  pair  in  the  enumeration  is  a 
least  upper  bound  for  a  program  index  of  the  function  being 
enumerated.  However  perhaps  it  is  an  ove r -r ea c t ion  to  insist 
that  inference  machines  must  work  for  arbitrary 
enumerations,  since  a  less  stringent  requirement  might 
suffice*  and  it  is  intuitively  the  case  that  a  good  teaching 
sequence,  or  order  of  presentation,  can  be  a  valid  aid  to 
learning . 

y 

If  in  the  definition  of  identification  "every 
enumeration"  is  changed  to  "primitive  recursive 
enumerations"  then  P  is  identifiable  in  the  limit  <Blum  and 
Elum,1975>.  This  follows  from  the  observation  that  every 
partial  recursive  function  can  be  enumerated  by  a  primitive 
recursive  function  (namely  that  resulting  from  the  standard 
dovetailing  enumerative  procedure).  The  method  is  to  go 

1  For  example,  classes  of  enumerations  containing  only 
partial  enumerations  that  can  be  translated  algorithmically 
into  a  correct  program  index  could  be  declared 
insufficiently  general. 
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through  the  list  of  the  partial  recursive  functions  provided 
by  some  (arbitrary)  listing  of  the  primitive  recursive 
functions,  until  one  compatible  with  the  current  partial 
enumeration  is  found,  upon  which  the  partial  recursive 
function  responsible  for  that  particular  primitive  recursive 
function  enumeration  is  output  <Cold,1967>. 

It  is  customary  in  functional  studies  to  assume  that  an 
inductive  inference  machine  may  choose  its  hypotheses  from 
all  of  P.  Tampering  with  this  assumption  can  alter  the 
machine's  power,  as  is  shown  by  a  comparison  of  the 
following  result  with  the  subsequent  characterizations  of 
ID. 

T  H E  0  P E  M  <Case  and  Smith,  1  9  7  8  >  Given  any  Popperian  machine  M 
3,  uniformly  in  M,  a  recursive  function  that  enumerates  the 
class  C  of  functions  M  identifies. 

This  follows  from  the  creation  of  C  by  the  extension,  by 
means  of  M,  of  every  "finite  initial  function" 

(i . e . f un c t i on s  whose  domains  are  some  finite  initial  portion 

of  N ) ,  the  class  of  which  is  recursively  enumerable. 

1’ention  should  also  be  made  here  that  although  it  is 

usually  assumed  that  the  inductive  inference  machine  has 

access  to  the  entire  partial  enumeration  f  to  make  its  nth 

r  n 

hypothesis,  the  effect  of  "memory  constraints"  is 

investigated  in  <\1  i  eh  ag  e  n ,  1  9  7  5>  .  Call  the  class  of  sets  of 

total  recursive  functions  identifiable  when  an  inductive 

inference  machine  is  only  permitted  to  see  the  next  element 
«*■*» 

in  f  ,  or  only  one  element  of  its  own  choice,  ITFPATE  and 
n  J 
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FEED-BACK  respectively.  Then  CONSISTENT  C  ITERATE  C  FEED¬ 
BACK  C  ID,  where  CONSISTENT  is  defined  shortly,  and  the 
containments  are  strict. 

It  is  w  o  r  t hw  hile  detailing  the  exact  relationship  of 
limiting  recursion  to  identification  in  the  limit. 

T F EO REN  <W i eh ag e n ,  1 9 7 P >  For  any  class  C  of  partial  recursive 
functions,  3  a  limiting  recursive  functional  F  such  that 
F  (f )  2  nin  :  ti  =  f>  for  all  f6C,  iff  C  can  be 

identified  in  the  limit  <V ieh agen , 1 9 7 8> . 

The  limiting  recursive  functional  of  the  theorem  is  defined 

bv  F  (f  )  =  lim  M([f  ])  where  N  is  an  inductive  inference 

n  n 

machine  that  identifies  C  and  f  is  any  enumeration  of  f. 

The  immediate  question  is:  Uhich  classes  of  functions 
can  be  identified  in  the  limit?  A  partial  answer  is  provided 
by  : 

T  HEP  REN  <Gold,  1  9  6  7>  Any  class  included  in  a  recursively 
enumerable  class  C  of  total  recursive  functions  can  be 
identified  in  the  limit.  ^ 

This  is  clear  by  the  following  argument.  Given  a  partial 
enumeration  f^,  enumerate  C,  checking  the  algorithms  one  by 
one  for  compatibility  with  f  ,  and  output  the  index  of  the 
first  compatible  algorithm.  Since  the  functions  in  C  are 
total  recursive,  these  checks  always  terminate,  and  since  an 
index  for  the  function  f  will  eventually  be  reached  (as  all 


In  fact,  any  class  of  total  recursive  functions  that  is 
r.e.  in  K  can  be  identified  in  the  limit  <Case  and 
Smith,  1  9  7  8  >  • 


' 
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previous  programs  that  do  not  compute  f  must  compute  a  value 
different  from  f  at  some  point  in  f's  enumeration  and  so  be 
discarded)  the  sequence  of  hypotheses  resulting  from 
successive  partial  enumerations  must  converge  as  required. 
Notice  that  it  is  not  decidable  in  general  whether  the 
current  hypothesis  is  correct. 

The  "enumeration  technique"  described  above  is 
beguiling  in  its  simplicity.  It  is  not,  however,  a  maximally 
powerful  method  even  on  R.  Define  NUN  to  be  the  class  of 
sets  of  total  recursive  functions  that  can  be  identified  by 
enumerating  some  class  of  partial  recurs ive  functions  and 
choosing  the  first  one  found  to  be  compatible  with  the 
current  data. 

THEO  REN  <Blum  and  Blum,  1  9  7  5>  A  set  S  of  total  recursive 
functions  G  NUN  iff  3  a  total  recursive  function  h  such 
that  every  function  f  G  S  is  h-honest. 

This  result  renders  enumeration  ineffective,  for  example, 
for  any  classes  containing  "arbitrarily  difficult  to 
compute"  <Hartnanis  and  H op c r o f t ,  1  9 7  1>  total  recursive 
functions.  The  "s  elf -describing  functions"  ^  form  such  a 
class  that  nevertheless  is  trivially  identifiable.  <Barzdin 
and  F r eiva 1 d , 1 9 7 2>  contains  the  first  mention  of  the 
existence  of  classes  of  total  recursive  functions  that, 


These  are  functions  for  which  the  least  x  such  that  f  (x  )  =  1 
is  an  index  for  a  program  for  f.  Obviously,  for  any  partial 
recursive  function  there  is  a  self -describing  function  that 
is  almost  everywhere  identical. 
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although  identifiable  in  the  limit,  are  not  included  in  any 
recursively  enumerable  class  of  total  recursive  functions 
<Barzdin  and  Fr e iva Id ,  1  9  7  2>  ,  and  for  which  enumeration  is 
therefore  clearly  inapplicable. 

NUN  and  IPknown^  are  incomparable  <Wiehagen,197P>. 

In  <Blum  and  Blum,1975>  a  more  general  technique, 
called  "A  Posteriori  Inference",  is  devised  that  depends 
upon  a  "running  bound"  on  the  target  function's  complexity 
by  means  of  general  recursive  operators.  Although  the  method 
is  defined  to  work  only  for  total  recursive  functions,  on 
these  it  is  extremely  powerful. 

TPEO  REN  <Blum  and  Blum,  1  9  75>  V  general  recursive  operators 
0,  3  1*  uniformly  in  0,  such  that  f  G  P.  is  everywhere  0- 
compressed  implies  M  identifies  f  in  the  limit. 

The  converse  of  this  result  is  also  true  and  is  stated 
later. 

However,  no  method,  however  clever,  will  ever  work  for 
all  total  recursive  functions  since: 

T F E 0 P E N  <Gold,  1967>  R  is  not  identifiable  in  the  limit. 

Since  the  proof  is  instructive  and  appears  only  slightly 
modified  for  several  other  results,  it  will  be  sketched 
here.  Suppose  there  is  some  inductive  inference  machine  M 
that  identifies  R  in  the  limit.  A  recursive  function  will  be 

^  This  is  the  class  of  sets  of  total  recursive  functions 
that  are  identifiable  in  known  time.  Classes  corresponding 
to  other  restrictions  are  indicated  similarly,  by  the 
concatenation  of  "IP"  with  the  appropriate  term. 
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(ineffectively)  constructed  that  M  does  not  identify  in  the 
limit: 

Let  f^  be  the  function  whose  increasing  enumeration  is 

(i 1  ,  i2 »  •  •  •  » i n > ®i nf )  where  Oinf  is  the  infinite  string 

0,0,0,...  ,  and  each  i  is  either  0  or  1  (arbitrary).  M 

must  identify  all  functions  looking  like  this,  so 

~  1 

suppose  M  guesses  correctly  for  f 

X  X 

2 

Let  f  be  the  function  whose  increasing  enumeration  is 
(i ^  ,  i 2  ,  *  *  *  ,  i n , 0X ^ , 1 i nf ) ,  where  0X  ^  is  a  string  of  xl  0's 
separated  by  commas.  Suppose  M  guesses  correctly  for 


Let  f  be  the  function  whose  increasing  enumeration  is 

x  T  x  2 

(i^,...,i  ,0  ,1  ,0inf).  Suppose  M  guesses  correctly 

~  3 

for  f  0  . 

x  3 

And  so  on.  Let  f*=limit  fn.  f*  is  total  recursive  since 

n 

following  the  above  procedure  permits  the  calculation  of 
f*(x)  for  any  x.  Yet  when  the  increasing  enumeration  of 
f*  is  fed  to  M,  by  the  construction  of  f*  M  does  not 
converge  to  any  index.  V' s  failure  demonstrates  the 
contradiction  inherent  in  the  assertion  that  an 
inductive  inference  machine  exists  that  identifies  R. 

ID  has  been  characterized  in  several  ways.  One  method 

uses  a  function's  complexity. 

Definition:  For  a  general  recursive  operator  0  define 

R  max  =  {t.  :  Vn  max  T . (x )  <  0(t.)(n)  where  max  is  taken 

o  1  i  —  i 

over  all  x<n,  and  t  ^  G  R } 
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T  H  F  0  R  E  M  <Wiehagen,  19  78>  A  class  C  of  total  recursive 

functions  G  ID  iff  3  a  general  recursive  operator  0  such  C 

C  R  max. 
o 

The  operator  that  establishes  the  necessity  of  this 

condition  is  simply:  Of (n )=max{T  (x ) ,  where  n=M[f  ]),  such 

n  n 

that  x<n). 

In  passing  it  should  be  noted  that  this 
characterization  gives  the  most  powerful  (on  R)  inductive 
inference  machines  or  strategies  possible. 

Since  very  often  there  are  no  estimates  of  the 
computational  complexity  of  the  target  function,  other  sorts 
of  characterizations  of  ID  have  been  derived. 

<Wieh age n , 1 9 7 8>  identifies  the  relevant  literature,  almost 
all  of  which  is  east  European,  recent,  and  untranslated.  A 
typical  result  shown  in  <Uieh age n ,  1 9 7 P>  is: 

TFEO  RED  A  class  C  of  total  recursive  functions  G  ID  iff  C  C 

some  recursively  enumerable  class  of  partial  recursive 

functions  such  that  the  ith  and  jth  functions  (i  ^  j)  differ 

o 

from  one  another  for  some  argument  _<  r(i,j)  where  r  G  P "  . 

The  previous  material  should  not  suggest  to  the  reader 
that  only  classes  of  total  recursive  functions  can  be 
identified.  Quite  the  contrary.  It  is  merely  that  the 
situation  for  P.  is  easier  to  analyze,  and  provides  an  upper 
bound  to  identification  results  in  the  sense  that  no 
identifiable  class  of  partial  recursive  functions  can 
contain  all  of  P. 

However,  unlike  the  situation  with  respect  to  P  and  ID, 


2.2.1  Identification  in  the  Limit 


40 


there  is  no  characterization  for  P  of  the  identifiable 
classes.  *  However,  if  the  object  is  to  identify  a  class 
containing  any  strictly  partial  recursive  functions,  then  it 
seems  there  must  be  some  way  to  bound  the  complexity  of  the 
functions  wherever  they  are  defined.  A  simple  way  to  do  this 
is  via  the  notion  of  h-honesty. 

THEOREM  <Blum  and  Blum,1975>  For  every  2-place  total 
recursive  function  h,  3  M,  uniformly  in  h,  such  that  M 
identifies  the  class  of  h-honest  functions. 

This  is  called  MA  Priori  Inference”  since  the  idea  is  to  use 
the  a  priori  bound  that  h  provides  on  the  complexity  to 
disallow  any  hypothesis  that  "takes  too  long”  to  compute  the 
current  partial  enumeration.  The  converse  statement  is  also 
true  and  is  stated  later. 

The  methods  used  in  the  demonstration  of  the  various 
results  entail  relatively  enormous  amounts  of  calculation. 
That  is,  they  establish  the  possibility,  not  the 
f eas ib i 1 i ty ,  of  identification  for  various  classes.  This  is 
a  feature  of  the  subject  at  present  <Coy,l°79>  and  is 
discussed  further  in  the  "Implementations”  section  of  the 
next  chapter. 

Uhether  there  can  EVER  be  efficient,  practical 
inductive  inference  machines  that  identify  non-trivial 
classes  of  functions  is  an  issue  that,  to  avoid  too  great  a 

*  Although  Wiehagen  <19  7  8  >  claims  that  most  of  his  results 
with  respect  to  R  can  be  duplicated  for  P. 


t 
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digression,  is  only  touched  on  very  briefly  here.  Many  of 
the  good  encoding  references  cited  in  Chapter  One  bear  upon 
this  question.  An  early  result  by  Cold  <  1  9  6  7  >  states: 

T  PEO  REM  If  M  is  an  identification  by  enumeration  machine  (of 
the  sort  described  earlier)  that  identifies  a  class  of  total 
recursive  functions  C,  then  there  is  no  inductive  inference 
machine  M '  identifying  C  such  that: 

1)  for  all  f  G  C,  if  M  converges  to  a  proper  index  for 
f  given  some  partial  enumeration  from  some  enumeration 
of  f,  then  so  does  M ' 

2)  For  some  g  G  C  M' ,  given  some  partial  enumeration 
from  an  enumeration  of  g,  converges  to  a  proper  index 
for  g  while  M  does  not. 

The  maximum  number  of  hypothesis  changes  involved  in  the 
identification  of  any  function  in  a  given  class  is 
investigated  in  <Barzdin  and  Fr eiva Id ,  1  9  7  2  ;  Fa rz di n ,  1  9  7  4>  . 
Since  the  worst  case  behavior  approximates  trying  every 
function  in  the  class  Rarzdin  gloomily  concludes  that 
input/output  listings  do  not  suffice  to  design  economical 
inductive  inference  machines,  and  goes  on  to  suggest  ways  of 
improving  efficiency  through  additional  information  such  as 
program  histories,  or  by  changing  the  character  of  the 
inductive  inference  machine  from  a  total  to  partial  function 
(there  is  a  class  of  functions  for  which  the  latter  requires 
arbitrarily  fewer  changes  to  succesfully  perform  an 
identification  <Barzdin  and  Fr e iva Id ,  1  9  7 2> )  .  That  the 
problem  of  identification  is  likely  to  be  insoluble  in 


. 
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practical  terms  without  some  clever  modification  is  also 
suggested  by: 

T H E 0 P. E M  <Gold,  1  9  78>  The  determination  of  whether  there 
exists  a  deterministic  finite  state  automaton  of  at  most  t 
states  compatible  with  a  given  partial  enumeration  is  NP- 
complete. 

Angluin  <19  7  9  >  shows  that  analogous  results  hold  even  when 
constraints  are  placed  upon  the  "density1'  of  the  partial 
enumeration,  and  surveys  the  general  question.  Although 
initially  pessimistic  about  the  possibilities  of  a 
practically  worthwhile  inductive  inference  machine,  Angluin 
<personal  c ommu n i ca t i on >  is  currently  hopeful  that  such 
machines  may  yet  be  designed  for  very  special,  yet  useful, 
classes.  Pudluk  <  19  7  5  >  shows  that  NP-complete  problems  exist 
in  the  logics  of  discovery  mentioned  earlier. 

Considerations  with  respect  to  a  candidate  solution's 
"complexity"  often  go  hand  in  hand  with  analyses  of  the 
potential  efficiency  of  discovery  procedures  <e . g . 

Kinber,  197  4>.  A. gain  the  discussion  here  will  be  extremely 
brief.  There  are  two  distinct  conceptions  of  program 
minimality  employed,  corresponding  to  the  distinction 
between  the  "size"  of  programs  in  some  representation  versus 
their  "efficiency"  <Fartmanis  and  F op cr o f  t ,  1 9 7  1  >  .  For 
example,  "size"  might  be  measured  by  the  number  of  symbols 
in  a  program's  description  or  its  position  in  some  standard 
numbering  of  F.  Size  measures  are  customarily  labelled 
"intrinsic".  "Efficiency"  is  defined  in  terms  of  some 
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computational  complexity  measure  with  respect  to  time  or 
space.  Efficiency  measures  are  cus  t  oma  rily  labelled 
"derivational".  "Total"  measures  are  given  by  some  function 
(usually  linear)  of  both  the  intrinsic  and  derivational 
measures.  The  impact  of  minimality  requirements  on  a 
solution  is  frequently  investigated  for  intrinsic  measures 
(e.g.  the  inductive  inference  machine  must  settle  upon  not 
only  a  correct  index,  but  the  least  correct  one  in  some 
ordering  of  P)  <Schubert,1974;  Freivald,1975>.  <Feldman  et 
al.,  19  69  ;  Fe 1 dma n ,  1  9  7 2  ;  Feldman  and  Shi e Ids ,  1  9  7  7>  consider 
minimal  identification  with  respect  to  total  complexity 
(specifically  wr  t  size  and  run  times).  Such  studies  shade 
into  those  exclusively  concerned  with  good  encoding. 

2. 2. 1.1  "Consistent"  and  "Reliable"  Identification 

Upon  reflection,  one  realizes  that  the  requirements  for 
identification  in  the  limit  permit  several  possibly 
undesirable  types  of  solution.  Although  an  inductive 
inference  machine  must  ultimately  hypothesize  a  correct 
program  index  for  any  function  that  it  identifies,  the 
interim  hypotheses  may  be  totally  bogus,  as  may  be  the 
responses  to  functions  that  M  cannot  identify.  For  example: 
The  self  describing  functions  can  be  identified  by  the, 
in  a  sense  trivial,  machine  M  that  outputs  anything 
until,  if  ever,  a  tuple  of  the  form  (x , 1 )  appears  in  a 
partial  enumeration,  and  thereafter  outputs  the  smallest 
such  x  that  appears  in  any  of  the  partial  enumerations. 
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Until  M  reaches  the  correct  value  of  x  for  a  self¬ 
describing  function,  a  program  hypothesized  by  M  may  not 
even  agree  with  the  current  partial  enumeration,  that  is 
M's  hypotheses  may  not  even  be  compatible  with  the 
available  data.  Moreover  M  converges  to  incorrect 
hypotheses  for  many  non-s elf -de scribing  functions. 

Machines  that  avoid  these  features  are  called  "consistent"^ 

2 

and  "reliable"  respectively. 

Definition:  An  inductive  inference  machine  M  is 

consistent  if  (M  identifies  a  partial  recursive  function 

f)  implies  (the  program  M([f  ])  is  compatible  with  f  V 

n  n 

n  )  . 

Definition:  An  inductive  inference  machine  M  is  re  liable 
on  a  class  of  functions  C,  if,  for  every  enumeration  f 
of  each  f  G  C,  M  converges  iff  M  identifies  f. 

Reliability  and  consistency  are  intimately  related. 

T  H  E  0  P  E  M  <Elum  and  Blun,  1  9  75>  For  any  inductive  inference 
machine  M,  if  M  is  consistent  then  M  is  reliable  on  P,  and 
if  M  is  reliable  on  P  then  3  M ' ,  uniformly  in  M,  such  that 
M'  is  as  powerful  as  M,  and  M'  is  consistent. 

To  obtain  either  consistency  or  reliability  it  is  a 
necessary  and  sufficient  condition  that  M  identify  the  class 
of  finite  functions  <Blun  and  Blum,1975>. 

It  is  perhaps  reasonable  to  hope  a  priori  that  some 

1  "Overkill"  <Blum  and  Blum,  1  9  75>,  "feasible"  <Co  Id  ,  1 9  76>  , 
"regular"  < Kinbe r , 1 9 7 4>  are  also  used, 
strong"  <Minicoz  zi  ,  19  7  (S>  is  also  used. 


and 
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effective  method  exists  to  ensure  that  an  inductive 
inference  machine  is  consistent  (reliable  on  P),  since  for 
example,  the  large  class  of  enumeration  machines  are 
consistent  by  construction.  This  is  not  possible  however. 

The  class  of  s  e  If -d  e  s  c r  i  b i  ng  functions  provides  an  example 
of  a  class  that  cannot  be  identified  by  a  consistent  machine 
(this  follows  immediately  from  the  proof  of  the  Non-Union 
Theorem,  given  in  Section  2. 2. 1.2,  and  the  previous 
equivalence  to  reliability). 

T  H  E  0  P  E  M  <Case  and  Smith,  1  9  7  8  >  There  is  no  algorithm  that, 
given  an  inductive  inference  machine  M,  specifies  a  function 
that  M  fails  to  identify. 

A  numbering  theoretic  characterization  of  the  classes  of 
functions  that  can  be  identified  by  a  machine  reliable  on  P 
(i.e.  consistent)  is  to  be  had  in  <Uieh age n ,  1  9  7  7>  •  The 
following  characterization  is  given  in  terms  of  the 
complexities  of  the  functions  involved;  its  converse  was 
stated  previously  as  the  method  of  "A  Priori  Inference". 

T UFO  PEN  <Blum  and  Blum,  1  9  75>  If  a  class  C  of  functions  can 
be  identified  by  a  machine  M  reliable  on  P  then  3  , uniformly 
in  M,  h  such  that  f  €  C  implies  that  f  is  h -honest,  for  h  a 
total  recursive,  2-argument  function. 

This  theorem  operates  in  much  the  same  manner  as 
Uiehagen's  result  stated  earlier.  The  key  factor  is  that  if 
a  function  is  h-honest,  then  the  complexity  of  some 
extension  is  bounded  by  the  maximum  of  some  constant  (from 
the  "almost  everywhere"  condition)  and  h  (x  ,  f  (x  )  )  .  By 
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enumerating  through  tuples  of  the  form  (i,c),  where  i  stands 
for  a  program  index  and  c  the  sought  after  constant,  and 
checking  whether  or  not  program  i  is  compatible  with  the 
data  or  requires  computational  time  exceeding  the  above 
allowable  bound,  one  must  eventually  settle  on  a  program  for 
some  extension  of  f  as  desired.  And,  of  course,  if  a  machine 
K  identifies  a  class  of  functions  C  and  M  is  reliable  on  P 
then  the  requisite  h-honest  function  is  given  by  h  (x , y )  = 

(the  maximum  complexity  encountered  by  any  machine 
hypothesized  by  M,  given  a  partial  enumeration  each  of  whose 
elements  is  <  (x,y),  when  working  upon  the  input  values  of 

these  partial  enumeration  elements). 

This  shows,  for  example,  that  reliable  (on  P)  machines 
cannot  identify  arbitrarily  complex  0-1  valued  recursive 
funct ions . 

Reliability  on  R  is  characterized  by  the  formulation  of 
the  converse  to  the  method  of  "A  Posteriori  Inference" 
mentioned  previously,  along  exactly  the  same  lines  as  the 
result  just  explicated  except  using  general  recursive 
operators  rather  than  total  recursive  functions.  Machines 
reliable  on  P.  can  be  much  more  powerful  than  those  reliable 
on  P  or  even  the  class  T  of  total  functions.  And  machines 
reliable  on  Pinf  =  P- { f  :  f  is  a  finite  function),  while 
more  powerful  than  consistent  machines,  are  nevertheless  not 
as  powerful  as  those  reliable  only  on  R  (for  which  some 
arbitrarily  difficult  to  compute  functions  may  be 
identified)  <Elum  and  Blum,  1975>. 
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To  summarize  then:  NUN  C  IPconsistent  =  IPreliableonP  C 
IDreliableonPinf  C  IDreliableonR  C  IP;  IP known  C 
IDconsistent;  and  IPreliableonT  C  IPreliableonR,  where  the 
containments  are  strict. 

Consistency  is  evidently  a  rather  strong  requirement  to 
place  upon  an  inductive  inference  machine.  A  related  but 
less  stringent  requirement  that  nevertheless  prevents  an 
inductive  inference  machine  from  "contradicting  the 
evidence",  is  called  "conf ormability "  by  Uiehagen  <  1 9  7  8  >  .  A 
machine  M  is  "conformable"  if  M's  hypotheses  are  always 
either  compatible  with  each  partial  enumeration  or  are 
possibly  undefined  at  some  of  the  data  points.  The  power  of 
conformable  machines  is  strictly  between  that  of 
unrestricted  and  consistent  machines. 

2. 2.  1.2  Communal  Identification 

The  previous  material  indicates  that  alone,  any 
inductive  inference  machine  has  definite  limitations.  Yet 
such  a  lone  machine  may  adequately  model  neither  a 
scientific  nor  a  linguistic  community.  The  Non-Union  Theorem 
states  that: 

T  P  E  0  P  F  N  <Elun  and  Blun,  1975>  {self -describing  functions)  U 
{finite  functions)  is  not  identifiable. 

This  is  despite  the  obvious  id e n t i f iab i 1 i ty  of  these  two 
sets  indi  vi  du  al  ly  .  A  simple  way  to  see  the  truth  of  this  is 
that  such  a  machine  would  have  to  be  consistent  (since  it 


identifies  the  finite  functions)  yet  consistency  is 
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unattainable  for  any  machine  that  identifies  the  self 
describing  functions.  There  is,  therefore,  a  difference  in 
the  power  of  an  individual  versus  that  of  a  collection  of 
indivi du  als . 

This  difference  in  power  exists  only  for  unreliable 
machines. 

T  H  E  0  R F  M  <M i ni coz z i ,  1  9  7 6>  Given  any  recursively  enumerable 
class  M  of  inductive  inference  machines,  each  of  which  is 
reliable  on  a  class  C  of  functions,  then  3  >  uniformly  in  M  , 
an  inductive  inference  machine  M'  that  is  reliable  on  C  and 
is  as  powerful  in  C  as  any  of  the  machines  belonging  to  1! . 
The  idea  behind  the  machine  N'  implementing  this  "Union 
Theorem"  is  to  gradually  feed  a  function's  enumeration  to 
more  and  more  of  the  machines  in  M,  and  by  checking  to  see 
whether  or  not  a  given  machine's  last  two  hypotheses  are  the 
same,  try  to  settle  on  a  machine  that  is  converging. 

The  informal  idea  of  a  group  of  inductive  inference 
machines  identifying  a  function  has  been  made  precise  in  two 
quite  disparate  ways:  First  by  the  requirement  that  some 
machine,  the  group's  "expert"  for  that  function,  identify 
the  function.  And  second  by  the  requirement  that  almost  all 
of  the  machines  identify  the  function  in  the  limit.  Case  and 
Smith  <19  7  F  >  investigate  the  consequences  of  the  first 
conception  for  static,  finite  groups  of  inductive  inference 
machines;  while  Schubert  <19  74>  and  Kugel  <  19  7  7>,  via 
Schubert's  notion  of  2-liniting  recursive  strategies,  in 
effect  utilize  the  second  for  expanding  or  potentially 
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infinite  groups.  ^  In  <Case  and  Smith, 197P>  a  number  of 
claims  are  made  relating  the  classes  of  functions  that  can 
be  (almost  everywhere)  identified  with  n  machines  restricted 
to  at  most  m  hypothesis  changes  (m  possibly  unbounded)  and 
(almost  everywhere)  matched  (see  next  subsection)  with  n 
machines.  Of  special  relevance  here  are  the  claims  that: 

2n  +  2  machines  allowed  only  m  discrepancies  (c  f .  2.3)  in  the 
solution  have  greater  identification  power  than  n  +  1  machines 
allowed  m+ 1  discrepancies;  The  identification  power  of  n  +  2 
machines  not  allowed  any  discrepancies  is  greater  than  the 
union,  over  d  P  N,  of  the  matching  powers  (cf.  2.2.2)  of  n 
machines  allowed  d  discrepancies;  and  MATCH  (cf.2.2.2)  is 
larger  than  the  union,  over  n  6  N,  of  the  identification 
powers  of  n  machines  allowed  any  (finite)  number  of 
dis  c  rep  ancies . 

2.2.2  Ma  t  ch i  ng 

Identification  in  the  limit  requires  that  an  inductive 

inference  machine  converge  to  a  particular  correct  program 

2 

for  a  target  function.  Matching  requires  only  that  almost 
all  the  hypotheses  are  correct,  i.e. 

Definition:  An  inductive  inference  machine  M  matches  a 


And  k-limiting  recursive  strategies  generally  embody 
conceptions  of  communal  or  s upr a-c ommu na 1  identification. 

<Feldman  et  al.,1969>  is  the  first  use  of  both  the  notion 
and  name.  The  idea  appears  elsewhere  <e . g .  Barzdin  and 
Fr e iva Id ,  1  9  7  2  ;  Case  and  Smith,  19  78>  although  the  exact 
definitions  and  names  employed  vary. 
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function  f,  if  for  any  enumeration  f  only  a  finite 
number  of  M([f  ]),  for  n=l,2,3...,  are  not  program 
indices  for  f. 

Definition:  The  matching -p ow  e  r  of  an  inductive  inference 
machine  M  is  the  class  of  all  sets  of  functions  that  can 
be  ma  t  ch  e  d  by  M  . 

Definition:  MAT  CH  is  the  class  of  sets  of  total 
recursive  functions  that  are  ma  t  ch  ab  le  . 

THEOREM  <Barzdin,  1  9  7  4>  MATCH  strictly  includes  ID. 

Any  class  of  "almost  everywhere  identifiable"  (defined  later 
in  this  chapter)  but  not  identifiable  functions,  provides  an 
example  of  a  class  of  functions  that  can  be  matched  yet  not 
identified  in  the  limit. 

Yet  matching  is  curiously  similar  to  identification  in 
that,  for  example,  a  re-examination  of  the  argument  used  to 
demonstrate  the  impossibility  of  any  machine  that  identifies 
R,  reveals  that  by  virtually  the  same  argument: 

TH  EO  P  EM  <F eld man  et  al.,1969>  R  is  not  match  able. 

Neither  matching,  nor  the  model  to  be  outlined  next, 
"extrapolation",  is  as  fully  developed  as  identification. 
Consequently,  many  of  the  issues  in  identification  have  not 
been  investigated  in  these  contexts.  No  concept  of 
reliability  exists.  <Feldman  and  Sh ie  Ids ,  1 9  7  7>  is  one  of  the 
few  papers  on  matching  in  the  presence  of  complexity 
constraints.  Consistency  ,  on  the  other  hand,  can  always  be 
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2.2.3  Extrapolation 

To  be  compatible  V7ith  the  original  problem  statement, 
an  inductive  inference  machine  must  discover  a  name  for  f. 
Both  identification  and  matching  have  assumed  that  the 
inductive  inference  machine  must  explicitly  generate  such 
names  as  hypotheses.  Fowever,  extrapolation  rests  upon  a 
subtler  interpretation,  namely  that  at  some  point  in  the 
enumeration  the  inductive  inference  machine  must  itself  have 
become  a  name  for  a  function  that  is  almost  everywhere  equal 
to  the  target  function.  In  intuitive  terms,  the  difference 
is  that  between  the  linguist  who  constructs  explicit 
grammars,  and  the  child  who  is  merely  seen  to  obey  some 
gr  amma  r . 


Pef inition :  A  total  e  nume  ration  of  a  function  f  is 


a  n 


enumeration  of  f  for  which  every  element  is  of  the  form 


(x  ,  f  (x  )  )  . 

Def inition :  A  q  ue  ry  part  ial  e  nume  ra  t ion ,  qf^,  of  a 
function  f,  is  the  finite  sequence  consisting  of  (the 
first  n-1  elements  of  a  total  enumeration  of  f) 
concatenated  with  (x  , ? )  where  (xn,f(xn)  is  the  nth 


Suppose  M  natches^f.  Define  M'  by  the^f  ollowing  program 
description:  Given  f  ,  calculate  i=  M (  [  f  Q ]  )  and  output  an 
index  for  the  prografi  t=  lambda  x[  f  (x  )  if  (x,f(x))  €  f  n  ; 

t^(x)  otherwise]. 


■ 
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element  of  the  particular  total  enumeration  of  f  in  use. 

Definition:  An  inductive  inference  machine  M 

e  xt  ra  p  o la  t  es  a  function  f  if  for  every  total  enumeration 

of  f  3  n  such  that  M  (  [  q  f  ]  ) =  f ( x  )  V  m>n.  * 

m  m 

Definition:  EXTPAP  is  the  class  of  sets  of  total 
recursive  functions  that  are  extrapolatable. 

Extrapolation  predates  both  identification  and  matching 
and  has  particularly  close  ties  with  the  new  computational 
models  of  randomness  mentioned  in  Chapter  One 
<Solomonoff,  1964>  . 

The  relationship  of  extrapolation  to  identification  is 
s imp le  . 

T  HEP  REM  <Case  and  Smith,  1 9  7  5  >  A  set  S  of  functions  can  be 
extrapolated  iff  S  can  be  identified  in  the  limit  by  a 
Popperian  machine. 

FXTRAP  can  also  be  characterized  in  ways  similar  to 
those  used  for  ID,  namely: 

THEOREM  <Earzdin  and  Fr e iva Id , 1 9 7 2>  A  class  C  of  functions  G 
EXTPAP  iff  C  is  included  in  a  recursively  enumerable  class 
of  total  recursive  functions. 


This  concept  has  been  variously  defined.  For  example,  in 
<Elum  and  Blum,1975>  it  is  defined  with  respect  to 
increasing  enumerations.  <Barzdin  and  Fre iva Id , 1 9 7 2>  define 
it  much  as  it  is  defined  here. 
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THEOREM  <Blun  and  Blum,1975> 

i)  If  an  inductive  inference  machine  M  extrapolates  a 
class  C  of  functions,  then  3  »  uniformly  in  M,  a  total 
recursive  function  h  such  that  f  G  C  implies  f  is  h- 

e  asy  . 

ii)  If  h  is  total  recursive,  then  3>  uniformly  in  h,  an 
inductive  inference  machine  M  such  that  f  is  h-easy 
implies  M  extrapolates  f. 

To  obtain  the  function  h  from  M  it  suffices  to  note  that 
from  the  recursive  function  that  enumerates  C,  h  can  be 
defined  as  h(x)=(the  maximum  complexity  involved  in  the 
computation  at  x  by  any  of  the  first  x  machines  enumerated). 
Conversely,  the  procedure  for  obtaining  M  from  h  follows  the 
pattern  seen  several  times  before.  That  is,  to  compute 
MC  Cqf  ,]  )  »  begin  enumerating  all  tuples  (i  ,  n  )  ,  where  i  stands 
for  a  program  index  and  n  for  the  complexity  bound 
adjustment  induced  by  its  "almost  everywhere"  nature.  Look 
for  a  combination  for  which  both  T_^  (y  )  <_  max  (n,  h  (y )  )  V  v  _<  x 

and  t  .  is  compatible  with  q  fx .  If  and  when  found,  output 
t  ±  (x  )  . 

Co  r  o  1  la  ry  EXTPAP  is  strictly  included  in  IP. 

From  the  preceding  characterization  it  can  be  seen  that  the 
class  of  "step-counting"  functions,  for  example,  cannot  be 
extrapolated,  yet  is  easily  identifiable  by  listing  P  and 
calculating  only  for  the  "time"  supplied  by  the  partial 


e  nume  ration. 
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Here  perhaps  it  should  be  noted  that  the  definitional 
variants  said  to  have  no  effect  upon  the  possibility  of 
identification,  may  very  well  affect  the  other  models. 
THEOREM  <B  a  r  z  d  i  n  and  Fr e iva Id ,  1  9  7  2>  EXTPAP  includes  ID  for 
partial  recursive  inductive  inference  machines. 

No  proof  accompanied  this  assertion.  In  fact,  the  inclusion 
is  strict  since: 

THEOREM  EXTP.AP  includes  MATCH  for  partial  recursive 
inductive  inference  machines. 

Proof:  Suppose  C  6  MATCH,  f  6  C,  and  M  is  a  machine  that 
na  t  ch  es  C  . 

For  qfn:  Let  M([f  ^])=i.  (Wnlg  M  may  be  assumed  total.) 
Output  t^(x^)  if  the  computation  halts. 

By  the  definition  of  matching  3  N  such  that  n_>N  implies 
t^  =  f,  and  so  the  extrapolated  values  past  this  point  are 
both  defined  and  correct.// 

With  the  definition  of  almost  everywhere  identification  and 
matching  in  the  next  subsection,  it  becomes  clear  that  this 
containment  is  also  strict,  since  the  same  method  seems  to 
work  for  showing  containment  of  the  classes  corresponding  to 
these  approximate  learning  criteria  in  EXTRAP. 

In  a  fashion  similar  to  that  for  identification,  the 
difficulty  of  extrapolation  has  been  estimated  by  bounding 
the  maximum  number  of  erroneous  answers  given  while 
extrapolating  any  function  within  the  class  <Barzdin  and 
Fr eiva Id ,  1  9  7  2>  .  But  there  has  been  much  less  work  for 
extrapolation  as  compared  to  identification  on  the  effect  of 


■ 
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definitional  variants  and  the  potential  difficulty  of  the 
task. 

2.3  Approximate  Variations  of  the  Main  Models 

The  previous  models  are  linked  by  the  requirement  that 
an  inductive  inference  machine  output  nothing  but  completely 
correct  hypotheses  past  some  point  in  any  enumeration. 
Approximate  learning  relaxes  the  "completely  correct" 
prerequisite  to  varying  degrees.  This  is  desirable  since, 
for  example,  P.  is  not  learnable  with  respect  to  any  of  the 
ma  i n  mo  de  Is  . 

Perhaps  the  simplest  yet  least  satisfactory  thing  to  do 
is  to  select  some  "priveleged"  finite  subset  S  of  the 
natural  numbers  and  consider  any  function  that  agrees  with 
the  target  function  f  on  S  to  be  a  "suitable"  name  for  f. 
Since  this  arises  in  the  context  of  language  identification 
<Wha r t on , 1 9 7 4>  its  discussion  is  deferred  until  Chapter 
Three. 

Hypotheses  that  are  guaranteed  to  agree  with  the  target 
function  only  on  some  finite  domain  seem  rather 
unsatisfactory.  Hypotheses  that  disagree  with  the  target 
function  at  only  finitely  many  places  perhaps  have  more 
appeal.  This  is  what  "almost  everywhere"  identification  and 


'  ' 
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matching  permit. 

Definition:  Given  two  functions  f  and  g,  the 

disc  rep  anci es  between  f  and  g  are  those  x  €  Domain(f) 

such  that  f(x)  ^  g(x). 

Definition  Given  two  functions  f  and  g,  f  =  g  for  n  G 
- -  n  0 

N  if  3  at  most  n  discrepancies  between  f  and  g.  f  =  g 

if  3  n  such  that  f  =  g. 

n  & 

Definition:  An  inductive  inference  machine  M  almost 
e  ve  ry  where  (d  )  ident  if  ies  a  function  f  in  the  limit  if 
for  every  enumeration  of  f  3  i  such  that  M  converges  to 
i,  and  i  is  an  index  for  a  program  that  computes  some 
extension  of  a  function  g  such  that  f  =*  g  (f  g). 

Almost  everywhere  matching  is  defined  analogously. 

Definition:  ID^.,  ID^,  MATCH*,  MATCH^  are  defined  as  are 
ID  and  MATCH  except  that  the  words  "almost  everywhere 
(d ) "  are  inserted  in  the  pertinent  locations. 

Permitting  even  a  single  discrepancy  between  the  target 
and  hypothesis  results  in  more  powerful  inductive  inference 
ma  ch i ne s . 

T  HEP  P EM  <Case  and  Smith,  1 9  7  8  >  IP  strictly  includes  ID^  Vn 
€  N. 

A  set  very  similar  to  the  self  describing  functions 
establishes  this  for  n=l  by  being  almost  everywhere  (1) 


These  are  also  known  as  "sub-identification" 

<Mini  coz  zi  ,  1  9  76>  ,  "anomalous  explanatory"  and  "behavioral" 
"identification  mod  _<  n  anomalies"  <Case  and  Smith,  1  9  7  8  > 
respect ive ly . 
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identifiable  but  having  at  least  one  function,  for  every 
inductive  inference  machine  M,  that  is  not  identifiable  by 
M.  This  corroborating  set,  S,  is  the  set  of  all  recursive 
functions  whose  value  on  0  is  an  index  for  a  program  that 
computes  f  at  all  save  perhaps  one  point.  S  is  trivially 
almost  everywhere  (1)  identifiable.  However  given  a  machine 
M,  a  function  f  G  S  that  M  does  not  identify  can  be 
(ineffectively)  constructed: 

This  f  is  defined  by  the  program  that  outputs  its  own 
index  at  0  (via  the  recursion  theorem)  and  is  defined 
elsewhere  by  a  program  that  constructs  an  ever  larger 
input/output  finite  sequence  containing  a  single 
''anomaly”  (i.e.  an  x  value  for  which  no  (x,y)  value  is 
given  in  the  infinite  sequence),  moving  it  iff  the 
definition  of  some  y  value  at  that  x  will  cause  M's  last 
hypothesis  to  be  incompatible  with  the  new  finite 
sequence,  or  if  the  new  finite  sequence  causes  M  to 
output  a  new  hypothesis.  If  the  anomaly  never  settles, 
then  by  construction  M  never  converges  and  so  does  not 
identify  the  function  that  the  infinite  sequence 
enumerates.  However,  if  there  is  some  anomaly  at  which 
no  y-value  definition  can  either  force  M  to  change  its 
mind  or  be  wrong  in  its  last  hypothesis,  it  must  be 
because  M's  hypothesis  is  undefined  at  that  point.  In 
this  case  the  function  that  is  identical  to  the  function 
defined  by  the  infinite  sequence  constructed  EXCEPT  that 
it  equals  0  at  the  anomaly,  is  a  function  that  M 
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mi s id e n t if  ies .  Yet  this  function  is  still  clearly  almost 

everywhere  (1)  identifiable  as  before. 

An  extension  of  this  method  leads  to: 

THEOREM  <Case  and  Smith,  1  9  78>  IP  L  strictly  includes  U  ID  .  n 
-  *  j  n> 

£  N . 


And  the  corresponding  strict  containments  hold  for 
matching  also  i.e.: 

TFEOP.  EM  <  Ca  s  e  and  Smith,  1 9  7  8>  MATCH  strictly  includes 
-  n  +  1 

MATCH  ,  for  n  €  N. 
n 

THEOREM  <Case  and  Smith, 1978>  MATCH^  strictly  includes  U 
MATCH  ,  n  6  N. 

So  great  is  the  power  conferred  by  the  acceptability  of 
a  finite  number  of  discrepancies  between  the  target  and 
hypothesis  that  strong  constraints  such  as  reliability^  can 
be  imposed  and  still  permit  powerful  identification  results. 
THEO  REM  <Mi ni c oz zi ,  1  9  76>  3  S  such  that  S  is  almost 
everywhere  identifiable  by  a  machine  reliable  on  P,  yet  S  is 
not  identifiable. 

Minicozzi  infers  the  existence  of  such  sets  from: 

T  HEP  P  EM  <Minicozzi,  1976>  If  a  set  S  of  functions  can  be 
identified  and  3  a  machine  reliable  on  P  that  almost 
everywhere  identifies  precisely  S,  3  a  machine  reliable  on  P 
that  identifies  S. 

Despite  this  however,  it  is  not  possible  to  almost 
everywhere  identify  R.  In  fact: 


1  . 

l .  e  . 


convergence  implies  almost  everywhere  identification 
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TPEOP.EM  <  Ca  s  e  and  Smith,  1  9  7  8  >  MATCH  includes  IP* 

This  is  easy  to  see  since  from  a  machine  M  that  almost 
everywhere  identifies  a  class  C  of  functions,  M'  can  be 
defined  which  takes  the  output  of  M,  splices  in  a  table 
containing  the  current  partial  enumeration  values,  and 
outputs  an  index  for  the  resulting  function.  Eventually  the 
last  of  the  discrepancies  must  have  gone  past  in  any 
enumeration,  and  past  that  point  M'  outputs  completely 
correct  hypotheses. 

<Blum  and  Blun,1978>  gives  a  general  condition  on  the 
complexities  of  the  functions  in  a  class  C  which  is 
sufficient  for  C  to  be  almost  everywhere  identified  by  a 
machine  reliable  on  R.  IP*  is  characterized  in 
<17 i  eh  age  n ,  1  9  7  8>  as  follows: 

It  is  assumed  that  the  complexity  measure  used  results  in 
complexity  classes  R  satisfying  the  condition  that  [  ( f  G 
F  iff  g  G  E  ,  holds  V  f,g,t  G  P  such  that  f (x )  =  g (x )  for 
a  lmo s  t  all  x . ] 

THEOREM  <Wi eh  a ge n ,  1 9  7 8 >  A  class  C  of  total  recursive 
functions  is  almost  everywhere  identifiable  iff  3  an 
effective  operator  0  such  that  C  C  { t  ^  :  T^(n)  _<  0  ( t  ^  )  (  n  ) 

for  almost  all  n  }  fl  R  . 

Case  and  Smith  <19  7  8  >  note  that  the  power  of  almost 
everywhere  (d )  identification  is  a  consequence  of  the 
follow ing  facts: 

1)  the  permissible  discrepancies  between  target  and 

hypothesis  include  those  where  the  hypothesized  program 


' 
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is  not  defined  rather  than  merely  defined  differently 
from  the  target  at  some  point 

2)  the  exact  number  of  discrepancies  needed  for  any 
particular  function  is  unknown. 

Any  definition  of  almost  everywhere  identification  that 
denies  either  of  these  two  conditions  is  reducible  to 
identification. 

Increased  power  is  not  the  only  advantage  sought  by 
settling  for  an  approximation  rather  than  a  replica  of  the 
target  function.  Approximate  learning  may  provide  a  means  to 
escape  from  the  explosive  computational  problems  seemingly 
inherent  in  the  implementation  of  inductive  inference 
machines.  Almost  everywhere  identification  is  "simpler"  than 
identification  with  respect,  for  example,  to  the  the  maximum 
number  of  hypothesis  changes  necessary  for  a  partial 
recursive  inductive  inference  machine  to  learn  any  of  the 
functions  in  a  class  C. 

THEOP EM  <Case  and  Smith,  1  9  7  8  >  {sets  of  functions  that  can  be 
almost  everywhere  n+1  identified  with  0  hypothesis  changes) 
strictly  includes  {sets  of  functions  that  can  be  almost 
everywhere  n  identified),  for  n  6  M. 

Mote  that  partial  recursive  inductive  inference  machines  are 
assume  d  here. 

In  general  it  is  the  case  that  the  classes  of  functions 
which  can  be  almost  everywhere  a  identified  by  machines 
restricted  to  b  hypothesis  changes  form  a  lattice  under  set 


inclusion 


■ 
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THEOREM  <  Ca  s  e  and  Smith,  1 9  7  P  >  If  C.  is  the  class  of 
-  r 

functions  which  can  be  almost  eve  ry  where  a_^  identified  by 

machines  restricted  to  b^^  hypothesis  changes,  then  C  C 

iff  (a.  <  a.)  and  b.  <  b.),  wh  ere  a.  and  b.  G  N. 

i-J  i  “  J  i  l 

Almost  everywhere  identification  lends  itself  to  the 
notion  of  learning  variants  of  already  learnable  functions. 
The  approach  discussed  below  of  beginning  with  an 
identifiable  class  and  "b  lowing  it  up"  appears  later  in 
Chapter  Five  for  the  fuzzy  variants  of  identification. 

Definition:  f  is  a  f i ni t  e  variant  of  a  function  g  if  g 

=  *  f- 

Notice  that  finite  variants  of  recursive  functions  are 
always  recursive.  This  is  not  the  case  for  the  variants 
defined  in  Chapter  Five. 

THEOREM  <Minicozzi,  197  5>  If  M  (almost  everywhere)  identifies 
a  class  C  of  partial  recursive  functions,  and  M  is  reliable 
on  P,  then  3>  uniformly  in  M,  I!'  that  (almost  everywhere) 
identifies  the  finite  variants  of  C,  and  is  reliable  on  P. 

Minicozzi  <1  9  7  5>  also  investigates  the  effect  of 
"recursive  variants"  (i.e.  derived  by  composition  with  some 
known,  1-1  recursive  function).  Her  investigations  in  this 
area  with  respect  to  the  "constant  bounded  functions"  (i.e. 
those  with  finite  range)  may  have  special  relevance  for 
learning  language  variants  since  the  functions  involved 
there  (cf.  3.1)  are  constant  bounded. 

At  the  time  of  writing,  <Me llish ,  1  9  7  8>  is  the  only 
attempt  to  define  a  notion  of  approximate  learning  which 
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permits  an  infinite  number  of  discrepancies  between  the 
target  and  successful  hypotheses.  This  is  done  in  the 
context  of  the  extrapolation  of  nonrecursive  functions  by 
total  recursive  functions.  Before  describing  these  results, 
a  brief  discussion  of  the  problems  associated  with  such 
functional  approximations  will  be  helpful. 

Similarity  between  the  number  theoretic  functions 
employed  in  this  area  is  based  upon  the  points  of  non¬ 
equivalence,  and  not,  for  example,  the  continuous  measures 
to  be  found  in  numerical  analysis  <  I  s  a  a  cs  on  ,  E  .  >  .  To  judge 
how  different  two  functions  f,g  are,  the  "size"  of  (x  :  f  (x  ) 
4  g  (x  )  }  must  somehow  be  measured.  Yet  as  soon  as  f  and  g  are 
permitted  to  differ  for  infinitely  many  points  in  their 
domains,  formidable  conceptual  problems  arise  in  trying  to 
measure  this.  The  problem  is  that  of  measuring  the  relative 
sizes  of  two  countably  infinite  sets  -  those  points  where  f, 
g  coincide  versus  those  points  where  they  do  not.  The 
obvious  method,  that  of  taking  the  limiting  percentage  of 
the  one  set's  members  in  arbitrary  joint  enumerations  of  the 
two  sets,  clearly  does  not  work,  different  enumerations 
being  capable  of  producing  arbitrarily  different  answers. 
Thus  if  f,  g  agree  on  the  even  numbers  and  disagree  on  the 
odd,  then  the  enumeration  (1,2, 3, 4, 5,...)  gives  1/2  as  the 
relative  agreement  of  f  and  g,  whereas  (1,3, 2, 5, 7, 4,...) 
yields  the  answer  1/3. 

The  most  pertinent  source  of  solutions  to  these 
problems  appears  to  be  in  the  studies,  notably  <Pose  and 
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U Ilian,  196 3>  ,  <Ts ich r i t z is ,  1  9  69  ;  1 9  7 1>  and  <Lyn ch ,  1 9 7  4>  , 
which  seek  to  weaken  the  concept  of  constructiveness  by 
approximating  (arbitrary)  functions  with  recursive  ones. 

<Aus iello  and  P r o t as i , 1 9 7 5>  analyzes  the  different  notions 
of  approximation,  showing  their  inter-relationships  and 
relating  them  to  the  "global”  approximations  provided  by 
limiting  recursion.  Informally  stated  the  situation  is  that 
regardless  of  the  exact  definition  of  approximation  used  3 
functions  that  are  not  approximable  by  recursive  functions, 
and  that  the  classes  of  approximable  functions  corresponding 
to  the  varying  definitions  are  incomparable. 

The  example  of  the  even  and  odd  integers  is  revealing. 
The  "standard  enumeration"  (1,2,3,...)  of  the  possible 
domains  leads  to  the  intuitively  acceptable  measure  of 
similarity.  Perhaps  this  is  why  two  of  the  three  studies 
cited  above  consider  the  standard  enumeration  as  the 
arbiter.  The  definitions  used  here  also  reflect  this 
acceptance. 

Definition:  Given  two  functions  f  and  g,  ACRE E  (f , g , n  )  ={x 
:  f(x)  =  g(x)  and  x_<n  >  . 

Definition:  A  class  C  of  total  recursive  functions  is 
conti nuous  if  any  n-tuple  of  integers  forms  the  first  n 
values  of  a  least  one  t  G  C. 

Both  of  the  following  theorems  by  Mellish  < 1  9  7  8  >  are 
stated  with  reference  to  a  class  C  that  is  some  (arbitrary) 
continuous  recursively  enumerable  subset  of  P.  For 
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simplicity  write  lin  inf  (AC- RE  E  (  f  ,  r  ,  n  ) -p  n  )  as  A(f,r,p)  1 
and  denote  by  m  the  function  evaluated  by  an  extrapolating 
inductive  inference  machine  M. 

T PEO  P  EM  V  0<p<l  3,  uniformly  in  p,  an  inductive  inference 
machine  M  such  that  if  for  a  function  f  A(f,r,p)  >  -oo  for 
some  r  6  G,  then  M  can  "approximately  extrapolate"  f  in  the 
sense  that  A  (  f  ,  m  ,  p  )  >  -oo,  although  it  may  be  different  from 
the  earlier  lim  inf. 

T PEG P. EM  V  p>0  3,  uniformly  in  p,  an  inductive  inference 
machine  M  such  that  given  any  function  f  M  can 
"approximately  extrapolate"  f  in  the  sense  that  lim  inf 
(AG  RE  E  ( f  ,n,n)/n)  2  lilrin  (AGREE  (f  ,  r,  n  ) /n  )  -  p  V  r  G  C 

provided  the  limit  exists. 


1  Mellish  implicitly  assumes  that  p  is  a  computable  real 
number.  Powever,  only  minor  alterations  of  his  proofs  are 
necessary  in  the  general  case  since  an  initial  (ineffective) 
construction  of  a  p '  by  a  suitable  inversion-truncation- 
inversion  or  truncation  of  p  suffices  to  reduce  the  problem 
to  one  for  computable  p. 


. 


Chapter  3 


LEARNING  LANGUAGES 


3.1  Intro duction 

A  formal  language,  L,  is  defined  to  be  any  set  of 
finite  strings  composed  from  some  finite  terminal  vocabulary 
V  ,  i.e.  L  C  V  *  <H op  croft  and  U 1 lma n ,  1 9 69 > .  Uhile  not 
departing  from  this  definition  of  L,  this  thesis  considers  L 
as  the  extensional  definition  of  its  characteristic 
function.  Although  constituting  a  simple  shift  in 
perspective,  this  permits  the  clarification  of  the 
relationship  of  functional  to  linguistic  learning,  and  the 
provision  of  a  straightforward  generalization  of  the 
standard  language  learning  material  to  fuzzy  languages. 

The  characteristic  function  ch  of  a  formal  language 
need  not  be  recursive.  However,  it  is  customary  to  assume 
that  the  languages  dealt  with  are  at  least  generated  by  a 
Type  0  grammar^-  so  that  at  worst  there  is  a  partial 
recursive  function  that  is  identical  to  ch  for  all  strings  s 


Until  the  development  of  limiting  recursion  this 
assumption  was  mandatory  to  even  make  sense  of  the  problem 
statement  since  for  non-Type  0  languages  there  was  no  finite 
encoding  device  or  name  to  be  discovered.  The  normal  form 
indexing  for  limiting  recursive  functions  <cf  .Crisculo  et 
al.  ,  19  75>  may  have  changed  this. 
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such  that  ch(s)=l,  and  is  undefined  elsewhere.  In  other 
words  there  is  always  a  seni-characteristic  function  sch  for 

L  . 

Both  the  characteristic  and  s emi -ch ar a c t e r i s t i c 
functions  can  be  used  to  name  a  formal  language  in  Chomsky's 
problem.  They  correspond  to  Gold's  <  1  9  6  7  >  "tester”  and 
"generator"  naming  schemes.  He  proves  that  if  identification 
in  the  limit  is  possible  given  naming  scheme  N  ^  ,  then  it  is 
possible  given  naming  scheme  N0  if  there  is  a  limiting 
recursive  translation  from  N  ^  to  ^ *  The  actual  names  used 
for  both  tester  and  generator  naming  schemes  in  <Cold,1967> 
are  partial  recursive  function  indices.  There  is  then  an 
obvious  recursive  translation  of  testers  to  generators,  yet 
even  for  recursive  languages  there  is  no  limiting  recursive 
translation  in  the  opposite  direction. 

Before  proceeding,  the  basic  scenario  for  language 
learning  employed  in  this  thesis  should  be  sketched  rather 
more  precisely.  Partial  enumerations  drawn  from  an 
enumeration  of  either  the  characteristic  or  semi¬ 
characteristic  function  of  a  language  L  are  input  to  an 
inductive  inference  machine  M.  So,  for  example,  if  L  =  {ah, 
aa>,  then  M  might  receive  an  input  sequence  like 
(  Cab,  1  )  ,  (a , 0)  ,  (a aa, 0)  ,  .  .  .  )  •  To  identify  L  in  the  limit,  say, 
M  must  converge  to  a  name  for  either  the  characteristic  or 
s  emi -c  h  a  ra  c  t  e  r  i  s  t  i  c  function  of  L. 

Thus  far  the  language  learning  scenario  should  seem 
little  different  from  the  previous  function  learning 
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material.  Indeed  it  may  appear  to  be  so  obvious  an  extension 
as  to  scarcely  merit  separate  statement.  If  so,  it  can  come 
as  something  of  a  shock  to  realize  that  it  is  not  the  one 
traditionally  employed  in  language  learning  studies.  The 
usual  scenario  <cf. Gold, 1967;  Uiehagen,1977>  presents  a 
language  as  a  function  of  time.  That  is,  the  inductive 
inference  machine  is  presented  with  sequences  of  the  form 
(  (  1  ,+s ^ )  ,  (  2 ,+s £ )  >  •  •  • )  where  +s  implies  that  s  €  L,  and  -s 
that  s  ~GL,  and  the  first  variable  is  taken  to  refer  to 
discrete  time  intervals.  While  not  affecting  the  basic 
results,  this  can  lead  to  some  minor  differences  with 
respect  to  the  various  types  of  enumeration  ^  ,  and  does  not 

emphasize  the  parallels  between  the  lingusitic  and 
functional  studies. 

However,  despite  the  basic  unity  between  the  two 
fields,  differences  arise  because: 

*Any  extensions  to  partial  functions  are  deemed 
permissible  for  function  learning,  but  not  for  language 
learning. 

*In  function  learning,  the  function  that  must  be 
correctly  named  is  the  function  that  is  enumerated.  For 
languages  there  are  two  possible  functions,  i.e.  the 
characteristic  and  s eri -charact eristic  functions,  either 
one  of  which  the  inductive  inference  machine  can  be 


A  few  examples  of  this  and  a  terminology  difficult  to 
disentangle  from  this  traditional  approach  has  meant  that 
few  citations  of  <Wiehagen,1977>  appear  here. 
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required  to  name  on  the  basis  of  enumerations  of  the 
other. 

I  will  elaborate. 

Four  situations  determine  the  analysis  of  the  task  of 
learning  a  recursively  enumerable  language  L  on  the  basis  of 
some  enumeration  E: 


1) 

chL 

i  s 

r e  cu  r s ive  and 

E  is  of 

chL  ’ 

2) 

ChL 

i  s 

r e  cu  rs  ive  and 

E  is  of 

schL 

• 

> 

3) 

chL 

i  s 

not 

r e  cu  r s ive 

and  E  is 

Of 

chL; 

4) 

chL 

i  s 

not 

r  e  cu  r s ive 

and  E  is 

o  f 

s  ch  ^  • 

With  respect  to  the  t  es  ter  naming  s  c h e m e : 

Case  1  is  exactly  that  of  learning  the  function  ch  as 
in  Chapter  2. 

Case  2_  is  a  new  problem,  that  of  learning  the  function 
ch^  as  in  Chapter  2  with  only  partial  information. 

Cases  _3  and  _4  are  impossible  to  solve,  by  definition. 
With  respect  to  the  generator  n  a  mi ng  s  ch  erne : 

Case  _1_  can  be  transformed  to  the  task  of  performing  Case 
1  with  respect  to  the  tester  naming  scheme  and 
subsequently  deriving  s  ch  from  ch. 

Case  2_  is  a  new  problem  whose  solution  can  sometimes, 
but  not  always,  be  obtained  as  in  Case  1  above  (i.e. 

Case  2  is  easier  for  a  generator  than  for  a  tester). 
Cases  3_  and  _4  are  essentially  problems  of  learning  sch^ 
as  in  Chapter  2,  but  with  one  crucial  difference. 

Chapter  Two's  acceptance  of  hypotheses  that  compute 
extensions  to  the  target  partial  function,  correspond 
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here  to  the  acceptability  of  hypotheses  generating 
super-sets  of  a  target  language.  This  is  patently 
unsatisfactory  (since  a  universal  grammar  generating  all 
of  V  *  would  then  be  a  general  solution).  Only  some 
extensions  of  a  language's  generation  are  permissible, 
namely  all  O-extensions. 

The  relationship  between  functional  and  linguistic 
studies  has  been  further  muddied  by  the  fact  that  since 
Gold  <  19  6  7  >  the  studies  in  language  learning  have 
normally  used  Chomsky  Type  grammars,  rather  than  program 
indices,  to  name  languages.  When  Type  0  grammars  are 
used,  this  has  no  effect  on  the  analysis  (and  is 
operating  within  the  generator  naming  scheme)  since 
there  is  an  obvious  recursive  translation  from  such 
grammars  to  the  indices  of  partial  recursive  functions 
(where,  of  course,  the  partial  recursive  functions  are 
now  taken  to  be  functions  from  V  *  to  N  rather  than  the 
usual  N  to  N)  and  vice  versa. 

TPEO  PEK  cHopcroft  and  Ullman,19f'9>  If  L  is  generated  by 
a  Type  0  grammar  G  then  3 ,  uniformly  in  G,  a  partial 
recursive  function  index  i  such  that  t  .  =  s  ch.  .  Conversely 

1  L 

given  any  (0-1  valued)  partial  recursive  function  index 

i  uniformly  in  i,  a  Type  0  grammar  C.  such  that 

t.(s)=l  iff  sch  (s  )  =1  • 
l  L 

It  is  the  use  of  Type  1,2, or  3  grammars  as  the  sole 
permissible  names  for  the  language  being  learned  that  is 
initially  so  confusing.  Of  course,  such  a  restriction  of 
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the  "Hypothesis  Space"  (cf.3.2)  trivially  implies  that 
only  Type  1,2,  or  3  languages,  respectively,  can  be 
learned.  More  significantly  it  destroys  the  distinction 
between  the  generator  and  tester  naming  schemes  since 
such  grammars  not  only  generate  the  language  but  also 
permit  membership  to  be  decided  algorithmically. 

T  P  E  0  R  E  M  <Hopcrof  t  and  Ullman,  1969>  G  is  a  context 
sensitive  grammar  implies  3>  uniformly  in  G,  r  G  P  such 
that  c  h  ^ ( C )  =  r. 

So  in  other  words,  work  dealing  with  such  grammars 
appears  to  operate  within  the  generator  naming  scheme 
while  actually  working  within  the  tester  naming  scheme. 
Even  this  statement  should  be  modified  slightly  however, 
since  any  class  of  Chomsky  grammars  is  recursively 
enumerable  and  consequently  there  are  total  recursive 
characteristic  functions  that  are  not  the  characteristic 
function  of  any  context  sensitive  language.  Furthermore, 
each  class  of  Chomsky  grammars  determines  a  certain 
"complexity  class"  much  as  those  used  in  the  previously 
surveyed  material  in  <Blum  and  Blum,  1  9  7  5>  <cf  .Hopcrof  t 
and  U 1 Ima n , 1 9 69 > .  In  short  then,  such  a  Hypothesis  Space 
automatically  restricts  attention  to  certain  recursively 
enumerable  complexity  classes  of  total  recursive 
functions. 

The  introduction  of  a  different  way  of  naming 
languages  customarily  entails  a  special  investigation. 
So,  for  example,  transformational  grammars  are  treated 
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in  <Hanburger  and  Uexl e r , 1 9 7 3a ;  19  7  3b;  19  73>,  regular 

bilanguages  in  <Pair,1976>,  transition  network  grammars 
in  <Chou  et  al.,  19  76>,  VL  decision  rules  in  <Larson  et 
al.,  19  7  7>,  and  L-systems  in  <Coy  and  P f 1 u ge r ,  1  9  7 9 >  . 

This  chapter  in  general  continues  Gold's  <  1  9  6  7  >  use 
of  partial  recursive  function  indices  as  language  names. 

3.2  The  General  Framework 

As  just  noted,  the  material  of  the  previous  chapter 
transfers  directly  to  the  more  general  problem  of  learning 
languages.  Both  Chapter  Two's  results  and  those  peculiar  to 
language  learning  studies  are  illuminated  when  viewed  in 
terms  of  differing  specifications  along  the  seven  dimensions 
o  f : 

1.  The  Naming  Scheme 

2.  The  Hypothesis  Space 

3.  The  Sample  Presentation 

4.  The  Inductive  Inference  Process  Allowed 

5.  The  Fundamental  Limiting  Criterion 

6.  The  Secondary  Limiting  Requirements 

7.  The  Interim  Constraints  ^ 

The  Naming  Scheme  has  already  been  discussed  at  some 
length  for  languages.  In  passing,  it  seems  rather  remarkable 

*  This  is  essentially  the  breakdown  given  in  <Biermann  et 
al.  ,  19  7  2>  .  That  survey  did  not  recognize  the  influence  of 

#1,  #4  or  #6,  and  omitted  aspects  of  #5  and  if  7  (e.g. 

extrapolation,  consistency). 
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that  the  prograns-for-extensions  naming  scheme  should  not 
have  been  challenged  or  even  received  the  most  cursory  of 
examinations  in  the  functional  setting. 

The  Hypothesis  Space  _H  ^  i  s  a  given  set  of  language 

2 

names  from  which  the  inference  process  must  choose  its 
hypotheses.  In  a  sense  it  represents  a  minimal  rationalist 
or  Chomskian  concession  in  what  is  otherwise  an  empiricist 
analysis.  As  indicated  previously  with  respect  to  Chomsky 
grammars,  the  hypothesis  space  can  affect  the  general 
problem  profoundly  by  providing  the  general  form  of  the 
target  language  (for  example,  indicating  whether  it  is 
regular  or  what  its  complexity  requirements  are)  and,  by 
serving  as  an  a  priori  framework,  facilitating  or  impeding 
the  search  for  a  language's  name.  It  is  a  concept  that 
although  relevant  to  functional  learning,  usually  does  not 
receive  explicit  treatment  in  that  context,  presumably  being 
either  P  or  a  recursively  enumerable  subset  of  R.  Since 
language  learning  studies  conventionally  take  place  under 
the  aegis  of  "grammatical  inference",  a  commonly  employed 
hypothesis  space  is,  for  example,  the  set  of  all  Context 
Free  grammars.  The  most  general  hypothesis  space  is  the  set 
of  Chomsky  Type  0  grammars,  or  P. 

*  Note  that  this  concept  has  not  been  applied  to 
^xt  rapo la  t ion. 

Names  and  names  of  names  are  deliberately  conflated  here. 
It  is  as  if  H  contained  subsets  of  P  or  R  or  the  Chomsky 
grammars,  rather  than  N.  See  the  various  papers  by  Barzdin 
for  some  consideration  of  the  features  obscured  by  this. 
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The  hypothesis  space  within  which  an  inductive  process 
P  is  constrained  to  operate  should  be  distinguished  from  P's 
power,  although  they  are  often  the  same  by  construction.  So, 
for  example,  a  Popperian  machine  has  R  as  its  hypothesis 
space  but  can  only  identify  recursively  enumerable  subsets 
of  R  . 

The  possible  solutions  to  the  language  learning  problem 
intuitively  appear  to  depend  upon  such  things  as  whether  the 
hypothesis  space  H  contains  a  name  for  the  target  language, 
whether  E  is  recursively  enumerable,  the  decidability  of  the 
members  of  H  (take  for  example  the  difference  between  an  H 
containing  only  the  strictly  Type  0  grammars  for  the  class 
of  regular  languages,  versus  that  containing  the  Type  3 
grammars),  and  so  on.  A  very  common  assumption  is  that  a 
Hypothesis  Space  H  is  a  d  mi s  s  ib le ,  i.e.  that  P  is  recursively 
enumerable,  and  the  members  of  H  are  decidable. 

The  Samp  le  Presentation  refers  to  what,  in  the  previous 
chapter,  was  called  the  "enumeration".  As  indicated  in  the 
previous  section,  the  issue  is  slight  ly  more  comp  lex  here 
since  decisions  must  be  made  not  only  as  to  which  class  of 
enumerations  the  inductive  process  should  be  successful 
upon,  but  also  as  to  whether  to  represent  a  language  by  its 
characteristic  or  s emi -ch a r a c t e r i s t i c  function.  The  latter 
decision  leads  to  the  central  distinction  for  language 
learning,  that  between  "text"  and  "informant"  sample 
presentations  . 

Def inition:  A  sample  presentation  S  for  a  language  L  is 
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(arbitrary)  text  if  S  is  an  enumeration  of  the  semi¬ 
characteristic  function  of  L. 

Definition:  A  sample  presentation  S  is  (arbitrary) 
info  rma  n t  if  S  is  an  enumeration  of  the  characteristic 
f  unct ion  of  L  . 

And  things  such  as  "primitive  recursive  text"  and 
"increasing  informant"  may  be  defined  in  the  obvious  manner. 

Sometimes  the  distinction  is  made  between  "complete" 
and  incomplete  text  (informant).  Only  complete  sample 
presentations,  that  is  only  sample  presentations  that  are 
enumerations,  without  omissions,  of  a  language's 
characteristic  or  s emi -ch ar a c t e r i s t i c  function,  are 
considered  here.  Another  kind  of  sample  presentation  that  is 
not  mentioned  further,  but  which  arises  in  language  learning 
studies  sufficiently  frequently  to  warrant  comment,  is  that 
of  a  "teacher"  <c  f . Knobe  et  al.,  19  76>.  This  has  not  yet  been 
adequately  formalized.  Finally,  some  researchers  have  used 
sample  presentations  that  embody  the  wish  to  specify  the 
language  in  accordance  with  a  certain  schedule  with  respect 
to  the  lengths  of  the  strings  in  the  partial  enumerations. 
Such  text  presentations  have  been  called  "effectively  quasi- 
ordered  by  length"  in  <Uh art  on ,  19  7 4>  ,  and  correspond  to  a 
methodical  enumeration  of  sch^ . 

Text  and  informant  embody  the  distinction  between 
examples  and  counter-examples  so  commonly  employed  in 
pattern  recognition  and  linguistically  oriented  studies 
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<cf  .  Fu  ,  19  74>  .  If  a  pattern  or  language  is  thought  of  as  its 
characteristic  function,  then  text  presents  the  inductive 
inference  machine  M  with  all  and  only  the  members  of  the 
pattern  or  language,  whereas  informant  presents  M  with  both 
members  and  nonmembers  labelled  as  such.  Since  text  is  just 
the  enumeration  of  a  partial  recursive  function  f  it  is 
always  possible  to  generate  it  algorithmically  (by  a 
standard  dovetailing  of  the  individual  computations  that  f 
performs  on  each  s  G  V^*).  Informant  obviously  cannot  be 
generated  algorithmically  if  ch  is  not  recursive. 

l-» 

The  I n  du  c t ive  Process  is  the  "mechanism"  that  generates 
the  hypotheses  upon  the  input  of  a  function's  enumeration. 

As  stated  previously,  this  thesis  deals  exclusively  with 
solitary  inductive  inference  machines  and  the  treatment 
expresses  this.  However,  communities  of  inductive  inference 
machines,  communities  of  communities,  and  so  on,  as  in 
<Schube rt ,  1 9  7  4>  ,  are  much  more  powerful  mechanisms  for 
inductive  inference.  *  A  third  possibility  is  to  equip  an 
inductive  inference  machine  with  a  Bernoulli  generator 
(p=l/2),  which  also  results  in  a  more  powerful  device 
<cf . Earzdin  et  al.,  19  7  2  ;  Podni ek s ,  1 9 7 5>  . 

The  L imi ting  Criterion  is  concerned  with  the  long  term 

^  These  mechanisms  rapidly  exceed  what  is  usually  known  as 
inductive  inference.  It  has  been  suggested  that 
"evolutionary"  rather  than  "inductive"  may  better  describe 
the  process  <Sch ube rt ,  1  9  7 4  ;  Fugel,  1  9  7  7>.  The  issue  is 
whether  inductive  inference  necessarily  is  a  solitary 
a  c t  ivi  ty  . 
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behavior  of  the  sequence  of  outputs  emitted  by  the  inductive 
inference  machine  when  a  language's  enumeration  is  input  to 
it  via  successive  partial  enumerations.  As  for  function 
learning,  extrapolation,  matching,  and  identification  in  the 
limit,  are  the  major  categories  for  language  learning.  The 
choices  of  whether  to  use  the  generator  or  tester  naming 
sch ernes,  and  whether  to  enumerate  the  characteristic  or  the 
seni-characteristic  function,  give  each  of  the  latter  two 
terms  4  distinct  meanings. 

Definition:  An  inductive  inference  machine  M  identifies 

a  language  L  in  the  limit,  w i t h  respect  t o  the  generator 

naming  s  ch  erne ,  give  n  text  [informant],  if  for  every 

enumeration  of  sch  [ch  ]  3  i  such  that  M  converges  to  i 

Li  L 

and  i  is  an  index  for  a  program  that  computes  some  0- 
extension  of  sch  . 

L 

Def  inition:  An  inductive  inference  machine  M  identif  ies 
a  language  L  in  the  limit,  with  respect  t  o  the  t  ester 
naming  schene,  gi ve  n  text  [informant]  ,  if  for  every 

enumeration  of  sch  [ch  ]  3  i  such  that  M  converges  to  i 

L  L 

and  t  .  =c  h_  . 
l  L 

The  definitions  for  matching  are  exactly  analogous. 

For  convenience  the  qualifications  ("with  respect  to..."  and 
"given...")  are  often  omitted  when  the  context  is  clear. 

Feldman  <  19  69  ;  19  7  2  ;  1  9  7  7>  investigates  a  concept  he 

calls  "strong  a p pr oa ch abi 1 i ty " ,  which  requires  not  only  that 
the  sequence  of  hypotheses  contain  infinitely  many 
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repetitions  of  a  correct  name,  but  also  that  any  incorrect 
name  for  the  language  appear  only  finitely  often  in  the 
hypothesis  sequence. 

The  Seconds ry  Limiting  Req  uirenents  refer  to  features 
such  as  reliability,  minimality,  the  number  of  permissible 
hypothesis  changes,  and  so  on.  They  too  are  requirements 
upon  the  hypothesis  sequence  as  a  whole,  rather  than  upon 
any  single  hypothesis.  Few  of  these  have  been  investigated 
specifically  within  the  linguistic  context. 

The  Interim  Constraints  deal  with  such  things  as  the 
compatibility  of  individual  hypotheses  with  the  current 
partial  enumeration  (i.e.  consistency),  good  encodings  of 
current  partial  enumerations,  the  decidability  of  individual 
hypotheses,  and  so  on. 

The  precise  formalizations  dealing  with  the  language 
learning  problem  can  now  be  expressed  schematically  along 
these  dimensions:  For  a  language  in  some  class  C,  a  sequence 
of  partial  enumerations  of  L  in  accord  with  some  Sample 
Presentation  and  Naming  Scheme  is  input  to  an  Inductive 
Inference  Process  P.  The  resulting  sequence  of  P's  outputs 
("hypotheses"),  must  satisfy  the  Fundamental  Limiting 
Criterion  as  understood  in  the  light  of  the  Naming  Scheme, 
and  possibly  some  Secondary  Limiting  Requirements.  If  the 
Fundamental  Limiting  Criterion  is  either  matching  or 
identification  then  each  of  P's  outputs  are  required  to 
belong  to  the  Hypothesis  Space.  Moreover  each  of  P's  outputs 
may  also  be  required  to  satisfy  the  Interim  Constraints. 
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3.3  The  Effect  of  the  Naming  Scheme 

It  is  trivially  the  case  that  if  a  language  is  not 
recursive  then  the  tester  naming  scheme  dooms  to  failure  any 
attempts  at  identification  or  matching.  Not  so  obvious  is 
the  fact  that  even  for  recursive  languages  the  tester  naming 
scheme  can  have  an  adverse  effect. 

T P E 0 F. E N  <Gold,  1  9  67>  If  a  class  C  is  identifiable  in  the 
limit  with  respect  to  the  tester  naming  scheme,  given 
primitive  recursive  text,  then  either  C  does  not  contain  all 
finite  languages  or  C  does  not  contain  any  infinite 
language. 

This  contrasts  sharply  with  the  fact  that: 

THEO  F  EM  <Gold,  1  9  67>  The  class  of  all  recursively  enumerable 
languages  is  identifiable  in  the  limit  with  respect  to  the 
generator  naming  scheme,  given  primitive  recursive  text. 

The  first  result  folio ws  from  a  proof  much  like  that 
used  to  show  the  non -i de nt if ia b i li ty  of  R.  That  is,  given  an 
inductive  inference  machine  M,  a  primitive  recursive 
function  is  (ineffectively)  constructed  that  enumerates  an 
infinite  language  in  such  a  way  as  to  cause  M  to  hypothesize 
ever  larger  finite  languages,  thereby  never  converging  to  a 
correct  decision  procedure  for  the  entire  infinite  language. 
Intuitively  this  counter-example  does  not  apply  for  the 
generator  naming  scheme  since  it  suffices  to  enumerate  the 
primitive  recursive  functions  and  by  checking  for 
compatibility  ,  eventually  output  the  index  of  the  primitive 
recursive  function  that  is  enumerating  sch  (and  hence  which 

Lj 
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provides  a  legitimate  name  for  L  with  respect  to  the 
generator  naming  scheme).  This  is  in  fact  the  idea  used  in 
the  proof  of  the  latter  result  (and  has  appeared  before, in 
section  2.2.1,  for  P  and  primitive  recursive  enumerations). 
Various  paragraphs  of  Section  3.1  are  relevant  here 

a  Is  o . 


3.4  The  Effect  of  the  Hypothesis  Space 

huch  of  the  discussion  in  Section  3.1  is  directly 
r  e leva  n  t  here. 

The  hypothesis  space  can  provide  a  convenient  notation 
for,  and  enumeration  of,  the  potential  solutions.  This  is 
one  of  the  great  benefits  of  hypothesis  spaces  of,  say, 
context  free  grammars.  Efficient  generation  of  such 
grammars,  while  not  trivial  (cf .Wharton 's  tests  in  section 
3.6),  is  facilitated  by  their  comparatively  simple 
structure. 

Cleverly  chosen  hypothesis  spaces  may  be  able  to 
circumvent  the  general  limitations  on  inductive  inference 
machines.  Thus,  using  certain  L-systems  as  hypothesis 
spaces,  it  is  possible  to  identify  D OL  and  D POL  given  text 
<Coy  and  P f luge r , 1 9 79> .  This  does  not  deny  the  general 
results  on  the  "poorness"  of  text  described  in  section  3.5 
since  these  classes  do  not  contain  all  finite  languages. 
Nevertheless  they  are  "useful"  classes.  The  search  for  such 
restricted  yet  interesting  hypothesis  spaces  has  long  been  a 
preoccupation  of  researchers  hoping  to  cut  across  the 
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boundaries  of  the  conventionally  learnable  classes. 
Hypothesis  spaces  of  certain  subsets  of  the  context  free 
grammars  <e  .  g  .  C  resp  i-Regh  iz  zi  ,  1  9  7  1>  ,  Type  3  grammars  <e  .  g  . 
Gold,1972>,  certain  kinds  of  Lisp  programs  <e . g .  Biermann  et 
al.,  19  7  7>,  simple  programs  containing  no  loops  <e  .  g . 

Treister  et  al.,1978>,  and  "l-pattern  grammars” 

<c f . A nglu in, pr ep r i nt >  have  received  attention. 

In  studies  on  (exact)  identification  and  matching  it  is 
usually  assumed  that  a  name  for  the  target  language  occurs 
within  the  Hypothesis  Space.  This  is  not  always  the  case  for 
the  approximate  variations.  For  G  -identification  (cf.end  of 
next  section),  given  an  admissible  hypothesis  space  H 
generating  a  class  of  languages  dense  in  the  universal  class 
of  languages,  then  if  a  name  for  the  target  does  not  occur 
in  H,  as  G  =>0  the  sequence  of  intrinsic  complexities  of  the 
hypotheses  diverges  cFhart  on,  1  9  7  4>  . 

3.5  The  Effect  of  Text  vs  Informant 

Despite  the  example  of  section  3.3  using  primitive 
recursive  text  and  the  generator  naming  scheme,  in  general 
it  is  impossible  to  identify  or  match  "large"  sets  of 
languages  given  text. 

T HEP P EM  <Oold,  1967>  If  a  class  C  is  identifiable  in  the 
limit  given  effective  text,  then  either  C  does  not  contain 
all  finite  languages  or  C  does  not  contain  any  infinite 
la  ngu  ag e  . 

Unlike  the  situation  for  primitive  recursive  text,  there  is 
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no  enumeration  of  the  class  of  enumerating  functions  (i  .  e . 
R),  and  so  no  obvious  way  to  acquire  a  name  for  even  the 
target's  s emi -cha ra ct e ris t ic  function.  An  exactly  analogous 
result  is  shown  in  <F  e  Id  ma  n ,  1  9  7  2>  for  matching  given 
recursive  text. 

A  result  generalizing  Gold's  theorem  requires  several 
new  concepts  for  its  statement. 

Def inition:  A  ch  a i n  of  1 a  ngu  ages  is  any  sequence 

C=  ( L  j  ,L  2  »•••»!>  n  )  of  languages  such  that  C  L  ^  ...  C 

L  . 

n 

Def  inition:  A  chain  C  is  inf  inite  if  C  has  infinitely 
many  distinct  members. 

Definition:  An  infinite  chain  C= ( L  , L  9  ,  .  .  .  )  has  a  fix- 
point  F  if  3  a  language  F  such  that  C  L0  C  ...  C  C 

...  C  F . 

T  FEO  REM  <Coy  and  P f  lu ge r ,  1  9  79 >  If  a  class  C  of  languages 
contains  an  infinite  chain  of  languages  and  its  fixpoint, 
then  C  is  not  identifiable  given  text. 

The  dilemma  at  the  root  of  the  problems  with  text  is 
that  it  does  not  distinguish  super-sets  of  the  target 
language  L  from  L  itself.  That  is,  nothing  ever  appears  in  a 


3.5  The  Effect  of  Text  vs  Informant 


8  2 

text  of  L  that  invalidates  a  super-set  name.  ^ 

The  simple  way  around  this  problem  is  to  use  either 
effectively  quasi-ordered  by  length  or  increasing  text.  Both 
of  these  kinds  of  text  implicitly  supply  the  necessary 
counterexamples  and  so  are  equivalent  to  informant. 

The  previously  cited  n on-i dent  if iabi li ty  results  employ 
a  vast  number  of  repetitions  to  "fool"  the  inductive 
inference  machine.  This  suggests  another  way  of  improving 
the  performance  of  text,  namely  bounding  the  permissible 
number  of  repetitions  of  any  element  of  schT  .  The  results 
shown  by  Feldman  et  al  <  1 9  6  9  >  indicate  that  while  this 
enlarges  the  potentially  learnable  class,  the  difference  is 
not  terribly  significant. 

Pursuing  the  idea  of  bounding  repetitions,  it  seems  to 
the  author  that  if  the  number  of  occurrences  of  every 
element  in  a  text  is  known  to  follow  some  non-zero  limiting 
frequency  then  that  text  is  equivalent  to  informant.  This 
constraint  on  the  limiting  frequencies  is  precisely  what 
gives  the  probabilistic  language  learners  their  power  on 
text  <For ni ng , 1 9 69 > ,  but  no  equivalent  results  exist  for 
non-p robabi lis t ic  language  learning. 

Another  way  of  "improving"  text  is  given  by  Crespi- 

*  In  passing  it  should  be  pointed  out  that  this  fact  also 
invalidates  the  strategy  of  finding  a  minimal  name  for  each 
partial  text.  This  is  suggested  by  the  example  of  the 
language  {a*}  -  {a}.  Given  any  text  presentation  of  this 
language,  a  correct  hypothesis  intuitively  must  be  more 
complex  than  any  hypothesis  that  generates  {a*}. 
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Feghizzi  <19  7  1>  •  He  constructs  machines  that  identify  non¬ 
trivial  subsets  of  the  Context  Free  languages  when  given 
parse  trees  rather  than  mere  strings.  However  this  is 
beginning  to  stray  considerably  from  the  statement  given 
initially  for  Chomsky's  problem,  and  so  is  not  discussed 
further. 

The  previous  material  gives  various  ways  to  make 
machines  more  powerful  given  text.  In  a  sense,  a  bottom  line 
to  this  is  given  by: 

THEOREM :  <Gold,1967>  If  C  is  identifiable  given  recursive 
text,  then  C  is  identifiable  given  arbitrary  text. 

So  then,  the  question  remains:  Fh at  CAN  be  learned 
given  (arbitrary)  text? 

THEOREM  <Gold,1967>  Any  class  of  finite  languages  is 
identifiable  in  the  limit  given  text. 

Clearly  all  that  must  be  done  is  to  always  hypothesize 
exactly  the  current  partial  text. 

IDtext  has  been  variously  characterized  in  <Pamburger  and 
Fe xle r ,  1  9  7  3b > ,  < Hug e  1 ,  1 9  7  7 > ,  <Wi eh  age n ,  1  9  7  7> ,  <Feldman  et 
al,  1  9  69>,  <Anglu in,  1  9  75b >  ,  and  <Coy  and  P f lu ge r ,  1  9  7 0 >  . 
THEOREM  <Anglu in, 1 9 79b >  For  C  any  class  of  recursive 
languages,  C  is  identifiable  given  text  if  either: 

1)  Any  finite  set  of  strings  is  contained  in  only 
finitely  many  of  the  languages  in  C. 

OP 

2)  a)  The  containment  problem  is  solvable  for  languages 


w  i  t  h  i  n  C  . 
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b)  For  each  language  L  €  C,  there  is  a  finite 
sublanguage  of  L  for  which  no  language  belonging  to  C 
both  contains  the  finite  sublanguage  and  is  itself 
included  in  L. 

C  onve  rs  e ly  : 

T  F  EP  P EM  <An glu in, 1 9 79b >  If  a  class  C  of  recursive  languages 
is  identifiable  given  text  then  C  satisfies  condition  2b 
a  bo  ve . 

There  are  several  results  that  appear  to  suggest  that 
text  may  not  be  so  terribly  limited  for  approximate 
i de n t i f  i  c a t i on .  However  close  examination  tends  to  destroy 
s  uch  op  t  imi  sm . 

THEO  P  EM  <Ui  eh  ag  e  n ,  1 9  7  7>  3  an  identifiable  (by  text)  class  C 
of  languages  such  that  (L  is  a  recursively  enumerable 
language)  implies  (]  L'  6  C  and  L'  is  almost  everywhere 
identical  to  L). 

Inspection  reveals  that  this  class  corresponds  to  the  class 
of  self  describing  functions,  and  unfortunately  there  is  no 
effective  method  to  acquire  a  name  for  L'  when  presented 
with  the  text  of  L  • 

A  few  definitions  are  required  before  presenting  Wharton 's 
seemingly  powerful  results  on  approximate  identification. 
Definition:  W= (w  , , • • • )  is  a  sequence  of  weights  if  Vi 
w^,  is  positive  and  >.  w^  =  1. 

Definition:  Given  some  finite  terminal  vocabulary  V  and 
some  sequence  of  weights  W  ,  for  L  C  V  *  nor  ( L )  =  >_ 
ch  (si)*wi  where  the  strings  s±  are  lexicographically 


•••  '*  ' '''  '  d  Bg 


■ 
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ordered. 

Definition:  Given  some  finite  terminal  vocabulary  V^_  and 

some  sequence  of  weights  W,  for  L  ^  and  L ^  C  V^.* 

dist  (  L  .  ,  L  0  )  =  norm  (L.  $  L  _  )  ,  where  9-  stands  for  the 

W  1  Z  w  1  z 

symmetric  difference. 

Definition :  €  -ident if ication  ,  w  i  t  h  respect  to  some 

sequence  of  weights  W,  is  defined  like  identification 
except  that  the  index  i  converged  upon  must  satisfy 
dist  (L.,L)<€  where  L.  is  the  language  specified  by  t., 

L  is  the  target  language. 

Dist^  is  a  metric  on  the  class  of  all  languages  within  V  * 
(i.e.  the  "universal  class"  of  languages),  and  so  permits  of 
such  metric  space  concepts  as  denseness.  The  corresponding 
notion  of  6  -matching  has  not  been  defined,  but  appears  to 
present  no  new  problems.  The  following  theorems  are  all 
stated  assuming  an  admissible  hypothesis  space  that 
generates  a  class  C  of  languages  dense  in  the  universal 
class  of  languages. ^ 

TFEOP  EM  <Uh a r t on ,  1 9  7 4>  V  €  >0,  C  is  6  -identifiable  given 
2 

text. 

T  EEO  REM  <Wh a r t o n,  1 9  7 4>  V  €  >0,  C  is  6  -identifiable  in  fixed 
time  given  effectively  quasi-ordered  by  length  text. 

These  results  suggest  that  an  approximate  learner  can 

1  {finite  languages)  is  an  example  of  such  a  class. 

^  Informant  allows  identification  in  known  time. 
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be  very  powerful.  However  their  force  is  vitiated  somewhat 
by  the  fact  that  the  measures  are  "weighted  metrics",  and  as 
such  have  the  feature  that  for  any  €  >0  and  any  language  L, 

there  is  a  language  L'  that  is  almost  everywhere  distinct 
from  L  yet  dist  (L,L')<6  .  So  languages  are  being 

identified  by  matching  only  some  finite  portion  of  them,  in 
this  case  all  strings  up  to  a  certain  length.  The 
justification  given  for  this  is  that  it  is  the  short  strings 
which  are  important  in  any  practical  sense. 

In  contrast  to  text,  informant  provides  full 
information  about  both  the  characteristic  and  seni- 
characteristic  functions.  In  language  oriented  studies  the 
discussion  is  most  often  about  recursive  languages  for  which 
the  results  in  chapter  two  apply  directly. 

3.6  About  Implementations 

Any  discussion  of  the  implementations  that  are 
tolerably  efficient  and  provably  valid  on  more  than  the 
handful  of  examples  used  by  the  designer  must  be 
extraordinarily  brief.  For  the  worthwhile  results  in 
language  learning  currently  consist  not  of  practical  designs 
for  inductive  inference  machines  but  of  abstract 
specifications  for  the  various  possibilities.  The  insights 
thus  far  do  not  provide,  nor  even  suggest,  efficient  general 
solutions.  This  should  not  be  taken  to  indicate  the 
subject's  irrelevance,  but  rather  its  current  focus.  To 
papraphrase  a  quote  given  in  <Por ni ng ,  1 9  69>  :  although 


■: 
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proving  the  existence  of  a  solution  is  only  a  first  step 
toward  finding  a  practical  solution,  in  a  subject  replete 
with  logically  intractable  problems  it  is  worthwhile 
delineating  even  the  possibilities  for  solution. 

Nevertheless,  some  mention  should  be  made  of  the 
various  studies  aimed  at  the  creation  of  more  or  less 
"practical"  language  learners.  The  fundamental  distinction 
made  between  the  various  implementations  is  that  between 
inductive  inference  machines  that  are  "e  nume  r  a  t  i  ve  "  versus 
those  that  are  "constructive". 

As  the  term  suggests,  enumerative  machines  enumerate 
through  a  Hypothesis  Space,  testing  each  name  (so  far  as  any 
complexity  restrictions  permit)  for  compatibility  with  the 
current  sample.  Essentially,  these  are  the  machines  employed 
in  the  results  detailed  in  the  last  two  chapters.  Thus  their 
power  is  relatively  easy  to  characterize,  and  they  can  often 
be  modified  so  as  to  infer  minimal  (intrinsic  complexity) 
hypotheses.  However  they  are,  in  effect,  just  the  "Monkeys 
with  Typewriters"  prescription  for  inductive  inference,  i.e. 
try  everything.  Clearly,  for  many  natural  hypothesis  spaces, 
this  strategy  results  in  an  explosive  number  of  candidates. 
Some  estimates  of  the  seriousness  of  this  problem  are  given 
in  <Bierman  et  a  1 .  ,  1 9  7  2b  >  .  An  example  is  their  calculation 
that  3  about  different  Type  3  grammars  with  k 

terminals  and  n  nonterminals.  And  Type  3  grammars  produce 
only  a  very  limited  number  of  the  (theoretically) 
identifiable  classes  of  recursive  languages. 
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Figures  such  as  those  above  have  ensured  that  very  few 
"users"  of  enur.erative  inductive  inference  machines  profess 
any  practical  ambitions  for  their  machines.  And  one  of  the 
major  efforts  by  Wharton  <  1 9  7 4>  ,  although  not  so  intended, 
serves  as  a  warning  rather  than  a  beacon  for  future 
practically  oriented  research.  For  even  using  "failure 
points"  to  eliminate  future  occurrences  of  any  hypotheses 
that  are  known  to  cover  failed  hypotheses,  "success  points" 
to  guide  future  hypotheses,  and  tests  for  complete 
equivalence,  disconnected  grammars,  blocking  grammars, 
merging  nonterminals,  direct  sub s t i tu t  ib i li ty ,  circular  non¬ 
terminals,  left  and  right  recursion  ambiguity,  missing 
terminals  and  so  on,  even  after  all  of  these  refinements, 
resulting  in  a  10^  decrease  in  the  number  of  grammars 
examined,  for  one  context  free  language  requiring  a  grammar 
with  only  6  rules  and  2  and  5  terminals  and  non-terminals 
respectively,  Wharton's  machine  counts  through  3  55,5  76  (!) 

candidates,  a  figure  that  while  admittedly  better  than  the 
2,225,706,812,694  necessary  if  the  above  refinements  had  not 
been  employed,  is  nevertheless  horrendous.  And  that  is 
virtually  the  largest  grammar  that  can  be  acquired  by 
Wharton's  machine  in  an  even  remotely  practical  sense. 

Many  of  Wharton's  tests  are  designed  to  eliminate 
"worthless"  grammars  before  employing  them.  His  grammar 
generating  schema  generates  many  undesirable  types  of 
grammars  (those  equivalent  to  previous  ones,  blocked  and 
disconnected  etc.)  and  very  many  grammars  that  do  not  even 
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generate  the  current  partial  enu me  ration. 

The  latter  problem  is  where  the  "Logics  of  Discovery" 
mentioned  in  Chapter  One  might  conceivably  be  useful.  An 
enumerative  machine  equipped  with  a  Logic  of  Discovery 
"preprocessor"  might  be  able  to  enumerate  through  only 
hypotheses  that  are  at  least  compatible  with  the  current 
partial  enumeration,  and  thus  (so  the  reasoning  goes)  obtain 
much  greater  efficiency.  A  minor  quibble  with  this  is  that 
the  results  of  Chapter  Two  show  that  such  an  approach  limits 
the  power  of  the  inductive  inference  machine  (since  this 
approach  clearly  guarantees  consistency).  However  any  of  the 
practical  efforts  suffers  (?)  from  this.  Fore  serious  are 
the  indications  that  such  an  approach  is  still 

c  omb  i  na  t  o  r  i  a  1  ly  explosive.  <  Pud  1  ak  ,  1 9  7  5>  details  several  NP- 
complete  problems  with  respect  to  Hajek's  Logic  of 
Discovery.  The  use  of  "derived  grammars"  <Fu,  1  9  7  5>  in  the 
development  of  finite  state  grammars  would  seem  to  realize 
the  best  that  such  a  preprocessor  could  do,  since  it  results 
in  an  admissible  hypothesis  space  each  of  whose  grammars  is 
compatible  with  the  current  partial  enumeration  and  at  least 
one  of  whose  grammars  is  correct.  Significantly,  this 
technique  is  considered  unmanageable  when  more  than  10 
nonterminals  are  needed.  Other  more  efficient  methods, 
starting  from  the  "canonical  derivative"  or  "k-tail"  grammar 
for  example,  unfortunately  do  not  guarantee  the  existence  of 
a  correct  grammar  in  the  enumerated  class. 

Constructive  machines  take  a  partial  enumeration  and, 
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beginning  with  some  "candidate  hypothesis",  often  the  ad  hoc 
grammar,  derive  a  "good"  hypothesis  for  the  current  sample 
by  modifying  or  adding  to  the  rules.  Proofs  of  limiting 
behavior  rarely  appear  ( <C r e s p i-Re gh iz z i , 1 9 7 1 >  is  one  of  the 
few  exceptions).  The  suggestions  of  <  S o  lo  mo  nof  f  ,  1  9  6  4>  , 
<Feldman  et  al,1969>,  <Flein  and  Kup p i n , 1 9 7 0> ,  <Lee  and 
Fu,1972>,  <Knobe  and  Knobe,1976>,  <Porter,1976>,  and 
<Mi c le t , 1 9 7 6>  for  example,  are  therefore  of  purely  heuristic 
value,  and  indeed  are  easily  confused  with  solutions  to  the 
good  encoding  problem. 

The  many  recursively  unsolvable  problems  for  Type  i 
(i<3)  languages  <cf  .Hopcrof  t  and  Ullnan,  1  9  69>  have  meant 
that  much  less  progress  for  these  languages  has  been  made. 
For  these  it  might  well  be  the  case  that  a  man-machine 
interactive  system  similar  to  that  in  <Klein  and 
K  up  p  in,  1  9  7  0>  ,  <Lee  and  Fu,  19  7  2>  or  <Guiho  and 
Jouannaud,  19  77>  offers  the  best  hope  for  workable  language 
learners  in  the  near  future. 


Ch ap  t  e r  4 


FUZZY  LANGUAGES 


4.1  I nt  ro  du  ct ion 

Fuzzy  sets  are  defined  by  replacing  the  usual  set- 
theoretic  0-1  valued  characteristic  function,  with  a 
"membership"  function  whose  range  is  contained  in  [0,1]  .  ^ 

The  basic  set  operations  are  then  defined  in  terns  of  the 
respective  membership  functions.  The  membership  function  of 
the  union  of  two  fuzzy  sets  with  membership  functions  f^  and 
f£,  is  defined  to  be  max(f^,f0),  since  intuitively  something 
is  in  the  union  of  two  sets  at  least  as  much  as  it  is  in 
either  one.  The  membership  function  of  the  intersection  of 
two  fuzzy  sets  with  membership  functions  f  and  f9,  is 
defined  to  be  mi  n  (f  ^  ,  f  2  )  »  since  intuitively  nothing  can  be 
in  both  sets  any  more  than  it  can  be  in  either  one.  A.  n  d 
finally,  the  membership  function  of  the  complement  of  a  set 
with  membership  function  f,  is  defined  to  be  1-f,  since 


This  is  the  usual  way.  Other  suggestions  have  the 
membership  function  take  as  its  range:  an  arbitrary  ordered 
semi-ring,  in  order  to  remove  the  bounded  nature  of  the 
degree  of  membership  <We ch le r , 1 9 7 5> ;  an  arbitrary  lattice, 
in  order  to  allow  incomparable  degrees  of  membership  <Kim  et 
al.,1975>;  the  set  of  fuzzy  sets,  as  Zadeh  defined  them, 
contained  in  [0,1],  in  order  to  incorporate  vagueness  into 
the  very  assignation  of  membership  cMizumoto  et  al.,1976>. 
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membership  in  the  universe  must  be  total.  From  these  three 
definitions  the  full  range  of  the  usual  s e t -t h e o r e t i c 
operations  can  be  defined  if  desired. 

There  is  a  certain  logical  necessity  to  the  previous 
max-min  definitions,  which  is  worth  realizing  in  the 
extension  of  fuzzy  set  theory  into  the  realm  of  formal 
languages.  Bellman  <197  3>  showed  the  max-min  definitions  to 
be  inescapeable  given  only  the  acceptance  of  five  axioms  ^ 
which  do  not  obviously  conflict  with  intuitive  notions  of 
vagueness,  and  the  logical  equivalence  of  the  statements  A  U 
(0)  E ,  with  the  statements  that  for  all  x  [x  G  A]  v  (and) 

[x  G  B]  . 

Since  a  formal  language  is  defined  to  be  simply  a  set 
of  finite  strings  constructed  from  some  finite  vocabulary, 
there  is  no  immediate  obstacle  to  the  definition  of  fuzzy 
formal  languages.  A  fuzzy  formal  language  is  just  a  fuzzy 
set  defined  on  the  finite  strings  constructed  from  some 
finite  vocabulary.  That  is: 

Definition:  A  fuzzy  f  o  rma  1  la  ng  ua  g  e  _L  is  {  (x  ,  m  (x  )  )  :  x  G 

V^.*  and  m  is  some  (arbitrary)  real  valued  function 

2 

mapping  V  *->[0,1]  },  where  is  some  finite  set  of 

^  1.  Union  and  Intersection  are  reflected  in  commutative, 

associative,  binary,  and  mutually  distributive 
operations  and  ,  v  on  [0,1]  . 

2.  x  and  y,  x  v  y  are  conti nuous  and  nondecreasing  in  x. 

3.  x  and  x,  x  v  x  are  strictly  increasing  in  x. 

4.  x  and  y  <_  min(x,y),  x  v  y  2  max(x,y) 

5  .  1  and  1=  1  ,  0  v  0=0 

^  Ron-fuzzy  languages  are  often  conflated  with  fuzzy 
languages  with  0-1  valued  membership  functions. 
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characters. 

D  ef ini t  ion :  In  the  above  definition,  m  is  known  as  the 
nemb  e  rs  h  ip  function  for  L,  and  the  s  erni  -membership 
function  sn  for  L  is  a  function  identical  to  m  except 
that  m(s)=0  implies  sm(s)=undefined. 

That  the  development  and  usefulness  of  formal  languages 
has  depended  upon  the  invention  of  finite  methods  to  encode 
the  structure  exhibited  by  infinite  sets  of  strings,  is  so 
obvious  as  to  scarcely  deserve  mention.  However  it  is 
precisely  at  this  point  that  the  concept  of  a  fuzzy  formal 
language  begins  to  experience  difficulties.  For  non-fuzzy 
languages  there  are  a  variety  of  satisfactory  ways  to 
accomplish  the  requisite  encoding.  Chomsky  grammars,  the 
most  common,  permit  the  naming  of  all  recursively  enumerable 
languages.  Unfortunately  there  is  no  comparably  successful 
notion  of  "grammar”  for  fuzzy  languages.  Instead  there  are  a 
number  of  suggestions,  all  flawed  in  one  or  more  respects. 

4.2  Grammars  for  Fuzzy  Languages 

4.2.1  Fuzzy  Grammars 

Non-fuzzy  languages  are  so  well  generated  by  Chomsky 
grammars  that  the  obvious  method  to  try  for  the  generation 
of  fuzzy  languages  is  the  generalization  of  the  powerful 
phrase  structure  type  of  grammar.  This  is  done  in  <Lee  and 
Zadeh,  1  9  6  9  >  •  Their  "fuzzy  grammars 


are  the  original 
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grammars  devised  for  fuzzy  languages,  and  still  receive  such 
wide  acceptance  as  to  render  all  subsequent  proposals  of 
only  peripheral  practical  significance. 

D  ef ini t ion :  A  fuzzy  gramma  r  is  a  quadruple  (V,V,S,P) 
where  Vt,V  ate  terminal  and  non-terminal  alphabets  and 
S  is  a  sentence  symbol,  as  for  Chomsky  Type  grammars, 
and  the  production  set  P  is  a  set  of  expressions  of  the 
form  (x  =  >y ,  g)  where  x,y  €  (V  U  Vn)*  and  g  G  (0,1]  . 

So  in  essence,  a  fuzzy  grammar  is  obtained  from,  a 
Chomsky  Type  grammar  by  attaching  "gramma  t  ica  li  ty 
coefficients"  g  to  the  production  rules.  However,  thus  far  a 
fuzzy  grammar  is  indistinguishable  from,  say,  a 
probabilistic  grammar.  Clearly  then,  a  grammar's  right  to 
the  title  of  "fuzzy"  resides  primarily  in  its  computation  of 
membership. 

Definition:  Given  a  fuzzy  grammar  G=  (  ,  S  ,  P  )  ,  the 

base  grammar  Cbase=(V,Vn,S,P')  where  P'=  {(x=>y)  : 

(x  =  >y,  g)  G  P,  for  some  g> 

Definition:  Given  a  two-tuple  (x  =>  y,g)  belonging  to 
the  production  set  of  a  fuzzy  grammar,  x  =>  y  is  the 
production  rule  and  g  is  the  production  gramma  ticality  . 
Definition:  The  language  generated  by  a  fuzzy  grammar  G 
is  defined  by  the  membership  function: 
m.  ( s  )  =  0  if  s  'GL(Cbase) 

sup  min  (m1,m  ...,m  )  otherwise, 

1  z  n 

where  hk  is  the  production  g r a mma t i ca 1 i ty 


associated  with  a  production  rule 
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appearing  in  some  derivation  of  s  by 
Gbase,  and  sup  is  taken  over  all  possible 
derivations  of  s  by  Gbase. 

Some  further  definitions  that  are  of  use  later  are: 

Definition:  The  set  of  support  of  a  fuzzy  language  L(G) 
is  L (Gbase) . 

Def inition:  Let  S  be  a  fuzzy  set  and  lambda  G  [0,1] , 
then  the  lamb  da  -le  ve  1  s  et  of  S_  is  the  non -fuzzy  set 
Slambda  =  {x  :  m  (x  )  _>  L) 

Definition:  C ive n  a  non-fuzzy  set  S,  and  lambda  6 
[0,1]  ,  lambdas  denotes  the  fuzzy  set  whose  membership 
function  is  given  by: 

n(x)=lambda  if  x  G  A 

0  otherwise 

Definition:  Given  a  fuzzy  grammar  G=(Vt,Vn,S,P),  and  0  < 
lambda  _<  1,  G  lamb  da  =  (  V^_  ,  V  ^  ,  S  ,  P  '  )  where  P=  {(x  =  >y)  : 

(x=>y,  m)  G  P  for  m  _>  J-arikda}. 

Def  inition:  Given  a  (non-fuzzy)  grammar  G  =  (  V  ,  V  ,  S  ,  P  )  , 
and  0  <_  lambda  <_  1,  the  lambda -fuzzification  of  G, 
lambdaG  is  (V  ,Vn,S,P')  where  P'={(x=>y,  lambda)  : 

(x  =  >y  )  G  P).1 

A  simple  example  of  a  fuzzy  grammar  is  the  following 

grammar  for  almost  balanced  parentheses.  Define 

G=(V.,V  ,S,P)  where: 
t  *  n  *  ’ 


1 


Note  that  this  is  not  a  fuzzy  grammar  if  lambda=0. 
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V  t  =  {1 , r }  ,  Vn={S> 

P={r1>r2,r3,r4> 

with  r (  S=  >1  r  ,  1 ) 

r  2=  (S=>lSr,  1) 
r  3=  (S=>1S,  1/2) 
r4=  C  S=  >  S  r  ,  1/4) 

Then  G  generates  a  fuzzy  language  L  with  a  membership 
function  m^  defined  as  m^(l^rX)=l  if  k=x;  1/2  if  k>x;  1/4  if 
x>k  . 

Several  defects  with  fuzzy  grammars  are  apparent 
immediately.  First,  the  languages  generated  by  them  have 
membership  functions  with  only  finite  ranges  since  the 
ranges  can  be  no  larger  than  the  set  of  production 
grarnna  ticalities.  ^  The  second  and  related  difficulty  is 
that  no  significant  interaction  can  occur  between  the 
grarma  ticali ties  of  the  relevant  production  rules  during  the 
assignation  of  a  membership  to  a  string.  For  example,  in  the 
above  grammar  the  "unbalancing"  production  rule  can  only 
lower  a  membership  to  1/2,  regardless  of  how  many  tines  it 
is  applied  and  how  unbalanced  the  resulting  string  is.  The 
philosophy  underlying  fuzzy  grammars,  namely  the  "weakest 
link  in  the  chain"  <Zadeh,1970>  conception  of  derivational 
validity,  violates  the  linguistic  intuition  that  repeated 

*  This  is  one  of  the  two  motivations  cited  for  the 
development  of  Fractionally  Fuzzy  Grammars  in  <Pe  Palma  and 
Y  au  ,  1  9  7  5>  . 
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application  of  a  less  than  fully  acceptable  grammar  rule 
yields  ever  less  grammatical  sentences;  scarcely  grammatical 
sentences  may  result  from  the  collective  employment  of  rules 
that  individually  can  scarcely  be  faulted. 

The  above  is  not  to  say  that  fuzzy  grammars  have  no 
good  points.  They  are  simple.  Moreover,  Type  1  fuzzy 
grammars  ^  generate  languages  with  recursive  membership 
functions  <Thomason  and  Ma r inos ,  1 9  7 4>  •  Perhaps  the  most 
compelling  argument  for  focussing  on  fuzzy  gramma rs  here 
however,  is  simply  that  they  are  the  grammars  that 
completely  dominate  the  fuzzy  literature  and  applications 
<cf.Fickert  and  Foppela,1976;  Thomason,  19  7  3  >  . 

4.2.2  F-fold  Fuzzy  Grammars 

"N-fold  fuzzy  grammars”  <Mizumoto  et  al.,1973>  define 
"conditional  grades  of  membership".  Like  all  of  the 
proposals,  they  too  start  with  a  Chomsky  Type  grammar  and 
modify  the  form  of  the  production  set.  The  "N"  refers  to  the 
number  of  rules  taken  into  consideration  when  defining  a 
given  rule's  gramma  t  i  ca  li  ty  .  In  effect  g  r  amma  t  i  ca  li  ty  is  no 
longer  a  property  of  a  rule,  but  that  of  a  rule's 
application  in  conjunction  with  other  rules.  For  example,  an 
element  of  the  production  set  of  a  1-fold  fuzzy  grammar  is 

1  Fuzzy  grammars  are  classified  as  Type  0,1,2,  or  3, 
depending  upon  where  the  corresponding  base  grammar  lies  in 
the  Chomsky  hierarchy.  A  body  of  results  directly  analogous 
to  those  for  non-fuzzy  formal  languages  exists  <Lee  and 
Zadeh,  1 9  6 9 >  . 
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of  the  form: 

(x  =  >y;  m  j  if  rule^  occurs  in  the  derivation 

1^2  if  rule2  occurs  in  the  derivation 

m  if  rule  occurs  in  the  derivation  ) 

n  n 

For  N=2  a  production  rule's  g r amma t i ca  1  i ty  is  permitted  to 
be  conditional  upon  the  occurrence  in  the  derivation  of  any 
two  given  rules,  and  so  on.  A  string's  membership  is  then 
evaluated  in  the  same  max-min  manner  as  for  fuzzy  grammars. 

A  certain  "context  sensitiveness"  may  be  achieved  ^  ,  but 

since  N  is  always  some  fixed  integer,  the  problems  noted  for 
fuzzy  grammars  are  merely  deferred  not  eliminated. 

4.2.3  Fractionally  Fuzzy  Grammars 

Fractionally  fuzzy  grammars  <Pe  Palma  and  Yau,1975> 
take  a  Chomsky  Type  grammar  and  attach  the  values  of  two 
rational  functions  g,  h  to  each  rule,  requiring  that  0  _< 
g  (r  )  <_  h  (r  )  <_  1  and  h  (r  )  4  0.  Given  such  a  grammar,  the 

membership  function  for  the  language  is  evaluated  by  taking 
the  supremum  of  T  g  (r  )  /  T  h  (r  )  where  7  is  over  all  the 
productions  used  in  some  derivation  of  s,  and  supremum  is 
over  all  possible  such  derivations  (and  is  assumed  to  be 
zero  if  none  exist). 

1  For  example,  context  free  threshhold  grammars  of  this  type 
can  generate  context  sensitive  languages  <Mizumoto  et 
a  1 .  ,  1  9  7  3>  . 


•I 
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An  example  of  these  grammars  is  G= ( V  , V n , S  ,  P )  where 
Vt={l,r>,  Vn={S> 

p= <r  1  »r 2 »r3 »r 4;  *S»h> 


and  r  ^  = 

S=  >1  r 

P  (r  1 )  =1 

*— H 

II 

r— i 

x: 

ii 

CN 

S  =  >1 S  r 

g (r  2)  =1 

h (r  2)  =1 

r  3 

S  =  >1  S 

g  (r  3)  =0 

h (r  3)  =1 

r4= 

S=  >  S  r 

g(r4)=0 

h(r4)=i 

G  generates  a  fuzzy  language  with  a  membership  function 
1c  X 

( 1  r  ) =mi n  (k , x ) /ma x (k , x )  .  It  can  be  seen  that  this 
generates  an  "almost  balanced"  parenthesis  language  much 
more  adequately  than  the  example  used  for  fuzzy  grammars, 
with  the  membership  of  a  string  steadily  declining  as  the 
string  becomes  more  unbalanced. 

Designed  with  future  applications  in  mind,  fractionally 
fuzzy  grammars  are  easy  to  parse  due  to  the  built  in 
convenience  of  back  tracking.  ^  Type  1  fractional ly  fuzzy 
grammars  result  in  total  recursive  membership  functions  <Pe 
Palma  and  Yau,1975>.  And  the  class  of  languages  generated  by 
fractionally  fuzzy  grammars  properly  includes  the  languages 
generated  by  fuzzy  grammars  (with  rational  production 
grarna  ticali ties  )  <De  Palma  and  Yau,  19  7  5>.  Best  of  all, 
fractionally  fuzzy  grammars  do  not  seem  to  suffer  the 


defects  noted  for  fuzzy  grammars.  The  repeated  application 


To  back  up  the  g r am ma t i ca 1 i t i es  after  an  unsuccesful 
attempt  at  parsing  a  string,  it  suffices  to  perform  the 
relevant  subtractions  of  g  (r  )  and  h  (r  )  . 
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of  an  only  vaguely  grammatical  rule  (i.e.  one  for  which 
h(r)-gCr)  is  large)  drives  the  resultant  membership  towards 
zero,  and  there  can  be  infinitely  many  levels  of  membership 
in  fractionally  fuzzy  languages.  But  the  converse  of  the 
first  point  is  that  the  influence  of  any  one  rule  no  matter 
how  ungrammatical  may  be  swamped  by  the  application  of  many 
others.  Furthermore,  like  the  suggestions  that  follow, 
fractionally  fuzzy  grammars  suffer  from  an  appearance  of  ad 
hocness.  No  justification  in  terms  of  any  conception  of 
fuzzy  languages  is  provided  for  their  novel  calculation  of 
memberships.  The  "weakest  link  principle"  of  fuzzy  grammars 
may  not  be  valid,  yet  it  at  least  provides  some  sort  of 
rationale  for  the  max -min  membership  calculations. 

Fractionally  fuzzy  grammars  provide  an  opportunity  to 
re-examine  fuzzy  grammars.  Although  originally  postulated 
for  the  s e t -t h e o r e t  i  c  operations  of  union  and  intersection, 
Bellman's  axioms  are  suggestive  in  the  case  of  fuzzy 
grammars  also.  The  assumption  for  fuzzy  grammars  is  that 
derivations  are  sets  of  production  rules  such  that  the 
membership  of  an  individual  derivation  in  the  set  of 
grammatical  derivations  corresponds  to  the  truth  value  of 
the  statement  about  its  constituent  rules  that:  "r  ^  is 
grammatical  and  r9  is  grammatical  and  ...  and  rn  is 

L- 

grammatical",  while  the  membership  of  the  set  of  several 

alternate  derivations  d^  in  the  set  of  grammatical 

derivations  corresponds  to  the  truth  value  of  the  statement: 

"d,  or  d 0  or...  d  is  grammatical".  For  fuzzy 
12  n 


gramma  rs  thes  e 
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truth  value  calculations  follow  Bellman's  axioms  on  the 
operations  on  [0,1]  .  And  the  max-min  assignation  of 
gramma t icali ty  necessarily  follows.  Fractionally  fuzzy 
grammars  assign  each  rule  a  definite  gramma t ica li ty  yet 
avoid  this.  The  only  point  where  the  corresponding 
calculation  of  truth  values  differs  with  Bellman's  axioms  is 
the  fourth  axiom  which  states  x  and  y  _<  min(x,y).  For 
example,  the  use  of  two  rules  r  ^  and  r?  to  generate  a  string 
s  could  result  in  m(s)=2/5  for  r^  having  g r a mma t i ca 1 i ty  1/2 
and  r  0  1 / 3 . 

4.2.4  The  Grammars  of  Eugene  Santos 

Fractionally  fuzzy  grammars  raise  the  suspician  that 
there  may  be  many  legitimately  "fuzzy"  ways  of  assigning  a 
string  s  a  real  number  while  generating  s  via  a  Chomsky  type 
grammar.  The  work  of  Santos  <197  4>  strengthens  this 
suspician.  Three  general  methods  for  realizing  fuzzy 
languages  are  outlined  there.  The  first  and  the  last  of 
these  are  just  the  standard  stochastic  (?!)  and  fuzzy 
grammars.  The  second  is  a  curious  hybrid:  A  normal  fuzzy 
grammar  is  given  a  fuzzy  set  of  sentence  symbols,  rather 
than  the  usual  single  sentence  symbol,  and  the  value  of  a 
given  derivation  is  then  computed  by  taking  the  product  of 
the  gramma ticali t  ies  (as  for  stochastic  grammars)  together 
with  the  membership  of  the  particular  sentence  symbol 
employed  to  begin  the  derivation.  If  there  is  more  than  one 
derivation  for  a  string,  then  the  string's  membership  is 


*: 
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taken  to  be  the  supremum  of  these  values.  If  there  is  no 
derivation  for  a  string,  its  membership  is  zero.  No 
rationale  for  the  use  of  max-product  grammars  is  provided  by- 
Santos,  and  the  inclusion  of  stochastic  grammars  in  the  same 
scheme  seems  rather  odd  initially.  However  max-product 
grammars  avoid  the  flaws  noted  for  fuzzy  grammars;  also, 
with  max-product  grammars  the  effect  of  the  application  of  a 
single  bad  production  cannot  be  swamped  by  the  subsequent 
application  of  fully  grammatical  productions,  yet  the 
repeated  application  of  slightly  ungrammatical  productions 
can  arbitrarily  lower  the  final  assessment  of 
g  r  am  ma  t  i  c  a  1  i  ty  .  And  the  similarity  of  max-product  grammars 
to  stochastic  grammars  may  not  be  so  very  unreasonable  after 
all  since  an  empirical  definition  of  the  "gr  amma  t  i  c  a  1  i  t  y  M  of 
a  sentence  might  well  be  that  it  is  the  likelihood,  not  of 
being  generated  (as  for  probabilistic  languages),  "but  of 
being  judged  acceptable  by  a  member  of  the  language 
community  at  a  particular  tire"  <  S  chub  e  r  t  ,  p  e  r  s  o  n  a  1 
communication^ 

4.3  Conclusions 

These  then  have  been  the  only  attempts  to  define 
generative  mechanisms  for  fuzzy  languages.  Fuzzy  grammars 
with  their  max-min  scheme  fail  to  generate  many  apparently 
useful  fuzzy  languages,  lacking  the  necessary  flexibility  of 
assignment,  yet  are  used  almost  universally.  N-f  old  fuzzy 
grammars  ultimately  have  the  same  failings  as  fuzzy 
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grammars.  Fractionally  fuzzy  grammars  seem  rather  arbitrary 
and  still  fail  to  model  some  grammatical  intuitions.  And 
max-product  grammars  are  rarely,  if  ever,  used. 

Perhaps  the  main  value  of  the  latter  types  of  grammars 
for  fuzzy  languages  rests  in  their  demonstration  that  the 
max-min  principles  of  fuzzy  set  theory  do  not  necessarily 
apply  in  any  obvious  way  to  the  application  of  production 
rules.  This  opens  the  way  to  a  general  study  of  the  ways  of 
generating  strings  and  attached  coefficients  simultaneously, 
with  the  goal  of  choosing  one  that  is  simultaneously 
powerful  and  yet  true  to  the  spirit  motivating  the  creation 
of  fuzzy  languages.  Tlhile  this  is  beyond  the  scope  of  this 
thesis,  some  possible  avenues  for  the  first  task  will  be 
mentioned  . 

The  results  of  associating  a  "cos  t  function"  with  the 
state  transition  function  of  a  finite  automaton  have  been 
investigated  under  the  name  of  "sequential  decision 
processes"  < lb  a r ak i  ,  1  9  7 6  ;  19  7  8  >  .  For  a  given  string 

abed. . .z,  the  "cost"  h(abcd...z)  is  determined  as  the  result 
of  the  consecutive  cost  evaluations  corresponding  to  the  fsm 
state  transitions  yielding  a,  b,  c,  and  so  on.  A  sequential 
decision  process  accepts  a  string  s  if  h(s)  does  not  exceed 
some  threshold  value  v.  Not  only  do  such  machines  subsume 
the  fsm  version  of  stochastic,  max-product  and  fuzzy 
grammars,  but  they  are  capable  of  accepting  any  r.e.  set  in 
V  *  . 

Perhaps  a  fuzzy  language  should  be  considered,  not  as  a 
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language  per  se  to  be  generated  by  a  grammar,  but  instead  as 
a  mapping  or  a  "translation"  from  strings  to  a  grammatical 
scale.  Translations  from  one  formal  language  to  another, 
have  been  investigated  under  the  name  of  "syntax-oriented 
translation"  < Ab r a ms  on ,  1 9  7 3>  or  "generalized  s y nt ax -d i r e c t e d 
translation  schemes"  <Aho  and  U 1 lma n ,  1 9  7 3>  .  Essentially  such 
translations  are  algorithms  that  analyze,  and  perform  some 
transformation  on,  sentences  from  some  class  of  languages. 
They  do  this  by  associating  one  or  more  transformations  with 
each  production  rule  and  non-terminal  symbol.  Some  common 
examples  of  their  use  include  the  translation  of  certain 
strings  of  zeros  and  ones  representing  the  positive  integers 
as  sums  of  Fibonacci  numbers  into  their  decimal 
representation,  and  the  differentiation  of  polynomial 
expressions.  Although  currently  the  theory  refers  to  context 
free  languages,  Abramson  <19  7  3>  believes  that  the  scope  will 
be  extended  eventually. 


Chapter  5 


FUZZY  LANGUAGE  LEARNING 


5.1  Learning  Fuzzy  Languages 

5.1.1  Previous  Uork 

At  the  time  of  writing,  <Tamura  and  Tanaka, 197 3>  is  the 
only  paper  ostensibly  addressed  to  the  problem  of  learning 
fuzzy  formal  languages.  This  is  a  surprising  situation 
considering  the  very  partial  nature  of  their  solution, 
particularly  given  the  amount  of  material  that  exists  on 
virtually  ever  other  conceivable  "fuzzy  topic"  <cf. Gaines 
and  Kahou t ,  1 9  7  7>  ,  and  the  repeated  expressions  of  interest 
in  some  method  for  learning  fuzzy  languages  from  a  "training 
set"  <c  f . Thonason,  1  9  73  ;  De  Palma  and  Yau,  19  7  5>. 

Despite  the  title  -"Learning  of  Fuzzy  Formal 
Languages"-  and  much  of  the  intuitive  motivation  and 
explanation,  Tamura  and.  Tanaka's  paper  is  in  fact  devoted  to 
the  approximation  of  a  non-fuzzy  formal  language  L  by 
successively  hypothesizing  ever  "better"  fuzzy  grammars. 
Informally  stated,  their  goal  is  the  development  of  a 
procedure  that,  given  an  initial  fuzzy  grammar  whose  base 
grammar  includes  a  grammar  for  the  target  language  L, 
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outputs  a  sequence  of  fuzzy  grammars  whose  languages  have 
membership  functions  that  approach  L  '  s  characteristic 
function  in  the  limit ^  .  While  this  goal  can  be  modified  to 
accommodate  fuzzy  target  languages,  not  only  does  their 
solution  break  down,  but  it  is  shown  that  such  a  goal  is,  in 
a  certain  sense,  futile. 

The  problem  is  restricted  to  recursive  target  languages 
and  the  fuzzy  grammar  0  initially  provided  is  decidable 
(i.e.  Type  1,2  or  3).  Each  string  of  a  partial  text  is 
parsed,  and  although  parsing  ambiguities  lead  to  some 
complications,  essentially  the  procedure  is  to  take  the 
fuzzy  grammar  hypothesized  for  the  previous  partial  text  and 
apply  a  standard  linear  learning  scheme  to  its  production 
grappa  ticalities  on  the  basis  of  a  production  rule's 
participation  in  the  latest  series  of  parses,  i.e.,  If  S  is 
a  set  of  rules  necessary  and  sufficient  for  some  parse  of 
any  string  s  in  the  current  partial  text,  and  g  (r  )  is  the 
g  r  am  ma  t  i  ca  1  i  ty  of  rule  r  in  the  previously  hypothesized 
fuzzy  grammar,  then  the  updated  gramma ticality  of  rule  r  is: 
k  * g n  ( r )  +  (  l-k)*chg (r ) 

where  k  G  (0,1)  is  arbitrary  and  ch  is  the 
characteristic  function  of  S. 

Using  this  method  Tamura  and  Tanaka  claim  to  be  able  to 
attain  their  previously  stated  goal  of  approaching  the 
target  language  in  the  limit.  What  they  prove  is 


1 


The  analytical,  not  the  number  theoretic,  limit. 
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considerably  different. 

THEO  P  EN  <Tamura  and  Tanaka,  1  9  7  3>  Civen  some  partial  text  T 
of  a  recursive  language  L,  and  an  initial  recursive  fuzzy 
grammar  G  such  that  L  £  L(Cbase),  then: 

1)  V  lambda  €  (0,1)  3  N  such  that  n>N  implies 

L ( C  ) 1 amb  da  =  L(G  ) 
n  pos 

2)  L(C_^base)  is  constant  V  i 

3)  {s  such  that  (s,l)€  T>  C  L(G  )  C  L  (Chase) . 

pos 

where  G^lam.bda  is  the  lambda  level  set  of  the  grammar 
hypothesized  after  T  has  been  input  n  times,  and  C- 

p  os 

is  a  subgrammar  of  Gbase,  i.e.  a  rule  r  is  in  G 

pos 

iff  r  participates  in  some  parse  of  a  string  in  T. 

So  their  result  concerns  grammar  convergence  for  repeated 
presentation  of  a  finite  sample  of  a  language,  rather  than 
convergence  for  a  presentation  of  the  entire  language. 


5.1.2  A  New  Outlook 

The  functionally  oriented  presentation  of  language 
learning  given  in  Chapter  Three,  extends  in  an  obvious 
manner  to  fuzzy  languages.  A  (non-fuzzy)  language  has  a  0-1 
valued  characteristic  function.  A  fuzzy  language  has  a  real 
valued  membership  function^  .  A  (non-fuzzy)  language  has  a 


^  This  glosses  over  the  fact  that  the  functions  are  no 
longer  number  theoretic,  but  rather  have  real  valued  ranges 
in  [0,1]  .  This  transition  poses  no  difficulties  if  (and  only 
if)  their  ranges  are  restricted  to  the  computable  real 
numbers.  This  seems  to  be  a  very  reasonable  assumption. 
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serai-characteristic  function.  A  fuzzy  language  has  a  semi- 
membership  function.  All  the  usual  set  theoretic  relations 
and  operations,  such  as  equality,  intersection  and 
containment,  have  fuzzy  equivalents.  So  then,  much  as  for 
(non-fuzzy)  languages,  learning  a  fuzzy  language  L  can  be 
considered  as  learning  either  L '  s  membership  or  semi¬ 
membership  function,  with  the  new  definitions  of 
identification,  matching,  informant,  text  ^  and  so  on  being 
obvious  extensions  of  the  former  ones.  Consequently  the 
functional  results  discussed  in  Chapter  Two  apply  to  fuzzy 
language  learning  just  as  they  do  to  non-fuzzy  language 
le  a  rni ng . 

However,  the  acquisition  of  grammars,  rather  than 
programs,  is  such  a  standard  requirement  in  language 
learning  studies,  that  the  problem  is  universally  known  as 
the  "grammatical  inference  problem".  Seen  in  this  light  the 
problem  still  exists  for  fuzzy  languages.  Fecause  of  their 
simplicity  and  wide  acceptance,  fuzzy  grammars  are  used  for 
the  remainder  of  this  section.  The  fuzzy  languages 
corresponding  to  fuzzy  grammars  will  occasionally  be  denoted 
as  the  fuzzy  (grammar)  languages. 


1  Note  that  (fuzzy)  text  in  this  sense  includes  exact 
grararaa  ticalities  for  each  string. 
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5.1.3  Assigning  Gramma ticality  to  Known  Rules 

The  obvious  way  to  proceed  is  to  modify  the  approach  in 
<Tamura  and  Tanaka,  1  9  7  3>  so  as  to  accommodate  n  on -t  r  i  vi  a  1  ly 
fuzzy  languages  while  remaining  within  the  framework 
outlined  in  the  previous  chapters.  This  leads  to  results 
like  the  next  theorem,  which  may  be  viewed  as  facilitating 
the  assignment  of  granra  ticalities  in  practical  situations 
where  some  grammar  is  already  known  that  includes  a  base 
grammar  for  the  target  language. 

T  FEO  P  EH  _1_  The  class  of  Type  0  fuzzy  (grammar)  languages  can 
be  identified  in  the  limit  given  text,  assuming  that  the 
inductive  inference  machine  is  given  a  Type  0  unambiguous 
grammar  C  that  includes  a  base  grammar  for  the  target 
language's  set  of  support. 

Proof  :  A  procedure  will  be  given  and  then  shown  to  work. 

Call  the  text  used  for  L,  L. 

To  begin  with,  0-fuzzify  G,  calling  the  result  CF . 

Fo  r  L  : 
n 

For  an  element  (s,m(s))  appearing  in  but  not  in 
parse  s  by  G.  Given  that  a  production  rule  r 
participates  in  the  derivation  of  s,  examine  the 
current  gramma  ticality  g  of  r  in  GF .  If  g  <  m(s)  then 
set  g  to  m(s).  Return  as  the  hypothesis  H  the  fuzzy 
grammar  obtained  by  removing  all  pairs  of  the  form 
(r,0)  from  the  production  set  of  GF. 

This  procedure  works  since: 

*There  can  be  only  finitely  many  distinct  values  m(s) 
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appearing  in  the  sample  presentation. 

*The  current  grama  ticality  of  any  rule  in  that 

confers  its  gramma  t  ical  i  ty  to  some  string  in  L's  set 

of  support  is  <_  its  granna  ticality  in  any  grammar 

derived  from  G  that  generates  the  target  language. 

*There  is  a  partial  text  past  which  the  rules  in 

the  production  set  of  are  adequate  to  generate  the 

set  of  support  of  the  target  language. 

*There  is  a  partial  enumeration  L  past  which  H 

n  r  n 

remains  constant,  since  a  rule's  g  r  amma  t  i  ca  li  ty  can 
increase  only  a  finite  number  of  times  and  there  are 
only  a  finite  number  of  rules  in  C. 

is  by  construction  a  fuzzy  grammar. 

Suppose  there  is  no  n  past  which  generates  the  target 

language.  Let  be  the  grammar  finally  settled  upon  by 

the  fourth  observation.  Let  s  be  a  string  assigned 
different  memberships  in  the  target  and  hypothesized 
languages.  If  m(s)=0  then  has  a  production  rule  r 
that  is  not  in  any  subgrammar  G  of  G  for  the  target 
language.  Since  s  is  never  parsed,  some  other  string  s^ 
in  L's  set  of  support  must  have  been  responsible  for  the 
introduction  of  a  non-zero  gramma t ica li ty  for  r  in  P  . 
Rut  this  implies  that  there  are  two  parses  of  s  ^ ,  one 
involving  r  and  the  other  only  rules  in  G  .  This 
contradicts  the  fact  that  the  grammar  G  is  unambiguous. 
Assume  m  (s  )  4  0.  If  m(s)  <  m^n(s)  then,  since  the  rules 

are  fixed  and  unambiguous,  a  contradiction  arises  to  the 
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second  observation  above.  However,  if  mTT  (s)  <m  (s  )  then 

P  n 

some  rule  r  in  H  used  in  the  derivation  of  s  has  a 

n 

gramma  t  icali  ty  <  m  (s  )  .  But  s  must  appear  in  some  partial 
enumeration,  forcing  all  the  rules  appearing  in  its 
derivation  to  have  gr  amna  t  ica  li  t  i  es  of  at  least  m(s) 
after  this.  Therefore  H  is  not  the  final  grammar 
hypothesized.  This  contradicts  the  assumption  that  is 

the  final  grammar  settled  upon.  Since  this  has  exhausted 
the  possibilities,  IT  must  generate  a  language 
equivalent  to  the  target  language.  // 

Note  that  this  result  seems  very  much  stronger  than  any 
results  cited  for  non-fuzzy  languages.  The  provision  of  a 
grammar  that  contains  a  correct  base  grammar  is  responsible 
for  this.  The  theorem  clearly  holds  also  for  the  Type  1,2,3 
r  es  t  a  t  erne  nt  s  . 

The  unambiguous  restriction  in  the  above  result  is 
inessential,  however  there  is  then  no  longer  nearly  such  an 
efficient  updating  procedure  due  to  the  masking  effect  of 
the  max  operator.  That  is,  the  participation  of  a  rule  r  in 
a  derivation  of  some  string  s  no  longer  implies  that  the 
gramma  ticality  of  r  j>  m(s).  This  forces  what  is  essentially 
a  trial  and  error  assignment  of  gra  mma  t  ica  li  t  ies  .  One 
plausible,  albeit  highly  inefficient,  solution  might  be  to 
begin  generating  all  possible  parses  for  each  string  s  while 
simultaneously  (by  dovetailing  the  operations)  generating  a 
tree  of  altered  GFs  by  the  previous  method  modified  so  that 
a  check  is  made  as  to  whether  the  current  gramma t ica li ty  of 
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at  least  one  rule  in  each  parse  is  _<m  (s  )  ,  and  the  branch  is 
eliminated  if  this  is  not  the  case.  Actually,  a  restriction 
to  unambiguous  grammars  is  not  uncommon  in  other  language 
learning  studies  <c  f  .  F  o  r  ni  n  g  ,  1  9  69  >  • 


5.1.4  Can  Coefficient  Assignment  Methods  be  Extended? 

While  the  previous  method  may  aid  in  the  construction 
of  fuzzy  grammars  in  certain  instances,  in  general  the  a 
priori  assumption  of  a  grammar  including  a  base  grammar  for 
the  target  language  is  unwarranted.  The  natural  response  is, 
as  Tamura  and  Tanaka  suggest,  to  attempt  the  addition  of  a 
"front  end"  that  discovers  this.  That  is,  much  as  for 
stochastic  languages,  the  problem  is  broken  into  two  parts. 
The  first  involves  the  acquisition  of  a  non-fuzzy  grammar 
containing  a  base  grammar  that  generates  the  target 
language's  set  of  support;  and  the  second  involves  the 
acquisition  of  the  fuzzy  coefficients.  Unfortunately  there 
is  good  reason  to  think  that  such  an  approach  is  not 
feasible.  With  the  exception  of  <Cr  esp  i-P  egh  iz  zi  ,  1  9  7  1>  ,  all 
language  learning  studies  are  concerned  with  finding 
grammars  that  generate  merely  the  strings  of  the  target 
language.  This  focus  arises  naturally  from  the  definition  of 
formal  languages  as  mere  sets  of  strings  with  no  associated 
derivational  histories  nor  "meaning".  The  possibility  of 
assigning  gr amma t i c a  1 i t i es  to  the  production  rules  is 
crucially  affected  by  this.  For  example,  if  the  target 
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language  has  six  distinct  levels  of  membership,  no  grammar 
with  only  four  production  rules  can  possibly  generate  it, 
yet  many  such  4-rule  grammars  may  generate  the  same  set  of 
support.  While  a  solution  may  be  had  involving  the 
interaction  of  the  two  sub-problems  (e.g.  When  a  conflict  of 
this  nature  occurs,  start  solving  problem  1  again.),  such  an 
approach  seems  very  clumsy  at  best. 


5.1.5  A  Ceneral  Solution 

The  previous  discussion  suggests  that  the  acquisition 
of  a  fuzzy  grammar  should  come  about  through  the 
simultaneous  acquisition  of  both  rules  and  coefficients.  An 
extension  to  a  result  for  fuzzy  sets  states  that: 

TP  EO  P  EM  <Zadeh,  19  70>  If  G  is  any  fuzzy  Type  i  (i  =  0,l,2,3) 
grammar  then  L(C-)=UNION  lambda  L(C-lambda), 

where  UNION  stands  for  the  union,  over  lambda  =the 
production  gramma  ticalities  appearing  in  G,  of  the  fuzzy 
sets  lambda  L(Glambda); 

and  the  Clambda  are  all  of  Type  i  <Z a deh ,  1 Q 7 0>  . 

This  fact  suggests  a  general  solution,  namely  that  of 
acquiring  a  separate  grammar  for  each  non-zero  lambda-level 
set  of  the  target  language,  by  standard  function  (language) 
learning  techniques,  lamb  da -f uz zi f y ing  them,  and  then 
outputting  the  union  of  these  fuzzy  grammars.  This  approach 
works  in  fact,  however  the  details  establishing  this  for 
each  learning  criterion  are  very  tedious.  The  next  theorem 
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provides  a  cleaner  technique  based  upon  essentially  the  sane 
idea.  A  function's  dona  in  is  now  assuned  to  be  included  in 
some  V^_*  rather  than  N. 

THEOREM  2  Civen  any  partial  recursive  s  eni -menb  ership 
function  f  with  finite  range  R,  3>  uniformly  in  f,  a  fuzzy 
Type  0  grammar  C  such  that  sm^^  =  f. 

Proof:  For  each  r  G  P,  assuming  R  4  4' 

a)  Construct  a  Type  0  grammar  for  f  ^(r).  This 
construction  is  uniformly  effective  in  f  since  f  ^(r) 
is  recursively  enumerable  ^  and  hence  is  the  domain 
of  a  partial  recursive  function  t  effectively 

c  ons  t  ru  c  t  ib  le  from  f  <P  o  ge  rs  ,  1  9  6  7>  and  hence  is  the 
language  generated  by  a  Type  0  grammar  C^  that  is 
effectively  c ons true t able  from  t  cHopcroft  and 
U  liman,  1969>. 

b)  Construct  a  fuzzy  grammar  by  r-fuzzifying  Gf . 

If  R  =  4  then  let  G  be  the  null  grammar.  Otherwise,  as 

2 

the  final  step  union  the  F^_  to  obtain  G. 

The  above  procedure  clearly  works  for  f=the  everywhere 
divergent  function. 


Begin  calculating  f(s^),  f(s2),  fCs^)*  dovetailing  the 

calculations,  and  whenever  a  calculation  of  f (s . )  terminates 

i 

and  yields  r,  output  s  ^  . 

1  This  is  done  exactly  as  described  by  Hopcroft  and  Ullman 
<19  6  9  >  for  Chomsky  Type  grammars.  Informally  stated,  the 
operation  consists  of  ensuring  that  each  F  has  a  unique 
non-terminal  vocabulary  (in  order  to  avoid  "derivational 
cross-overs")  save  for  a  common  sentence  symbol,  and  then 
unioning  the  individual  terminal  and  non-terminal 
vocabularies  and  production  sets. 


. 


5.1.5  A  General  Solution 


1  15 


Suppose  f(s)=r.  Then  s  6  f  *(r)  and  so  s  is  generated 
by  and  has  membership  r  in  L(F  ).  Consequently  the 

membership  of  s  in  L(G)  is  at  least  r.  But  if  this 
membership  is  greater  than  r,  then  s  is  generated  as 
well  by  some  G^  4  C  ,  which  in  turn  implies  that  f(s)=u 
4  r  which  is  a  contradiction.  Therefore  the  membership 
of  s  in  L(C)  is  exactly  r. 

Suppose  f (s ) =u nde f ined .  Then  s  will  not  be  generated  by 
any  G  and  hence  will  have  an  undefined  s  emi -me  mb  e  r  s  h  ip 
in  L ( G  )  .  /  / 


The  converse  of  this  theorem  is  immediate.  Moreover, 
although  the  theorem  has  been  stated  with  reference  to  Type 
0  fuzzy  grammars  and  partial  recursive  semi -me  mb  e  rs  hip 
functions,  only  slight  modifications  are  needed  for  the 
other  types  of  Chomsky  grammars  and  corresponding  functions. 
A  function  f  is  said  to  be  "computable  by  a  finite 
[pushdown]  [linear  bounded]  automaton"  M  if  M  accepts 
precisely  {(s,f(s)):  s  6  V  *  and  f(s)  is  defined). 

COBOL  LA  BY :  Given  any  s em i -me mb e r s h ip  function  f,  with  finite 
range  P„  ,  such  that  f  is  computable  by  a  finite  [non- 
deterministic  pushdown]  [linear  bounded]  automaton  3 > 
uniformly  in  f,  a  fuzzy  Type  3  [2]  [1]  grammar  G  such  that 

SmL(C)  =  f* 

Proof:  By  analogy  with  the  proof  of  the  theorem,  replacing 


Turing  machines  by  finite,  pushdown  or  linear  bounded 
automata  respectively.  In  somewhat  more  detail,  for  finite 
automata,  the  altered  lines  of  the  former  proof  are  as 
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follows.  Let  M  be  a  finite  automaton  that  accepts  f. 
Construct  a  Type  3  grammar  for  f  ^(r).  This  can  be  done 
effectively  since  f  ^(r)  is  accepted  by  a  finite  automaton 
<Popcrof t  and  U liman, 19 69>,  namely  the  one  that  when  given 
s,  gives  (s,r)  to  M.  This  step,  for  the  three  different 
types  of  machines,  rests  upon  the  fact  that  if  fs  is  a 
finite  state  transducer  that  adds  a  fixed  symbol  to  its 
input  and  g  a  finite  [push  down]  [linear  bounded]  automaton, 
then  g  composed  with  fs  is  a  finite  [push  down]  [linear 
bounded]  automaton.  // 

The  method  is  now  simplicity  itself.  To  identify 
[match]  a  class  of  fuzzy  languages  using  a  hypothesis  space 
containing  fuzzy  grammars,  construct  an  inductive  inference 
machine  that  identifies  [matches]  (in  the  linguistic  sense, 
i.e.  extensions  are  not  permissible)  the  corresponding  class 
of  partial  recursive  (s  emi -me  mb  e  rs  h  ip  )  functions,  and  pass 
its  hypotheses  to  a  Turing  machine  that  translates  the 
hypothesized  program,  index  into  the  corresponding  fuzzy 
grammar  via  the  procedure  outlined  above.  This  permits  most 
of  the  (non-fuzzy)  language  learning  results  to  be  restated 
easily  for  fuzzy  (grammar)  languages  using  hypothesis  spaces 
containing  only  fuzzy  grammars.  For  example,  call  the  class 
of  sets  of  fuzzy  languages  generated  by  fuzzy  grammars  of 
type  i  FGP.ARL,  then: 

CO  ROLLARY  :  FC.PAMq  is  identifiable  in  the  limit  given 
primitive  recursive  text. 

COROLLARY  2_:  FC  RAM^  is  identifiable  given  informant. 
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CO  POL  L/  PY  3_:  The  total  recursive  subclass  of  FGFAFp  that  can 
be  matched  given  informant,  is  strictly  larger  than  the 
total  recursive  subclass  of  FGFAMq  that  can  be  identified 
given  informant. 

A  possibly  undesirable  feature  of  this  solution 
technique  is  that  the  grammars  hypothesized  may  be  ambiguous 
despite  the  fact  that  there  is  an  unambiguous  grammar  for 
the  targe t  language.  Suppose  the  target  language  L  has  three 
non-zero  levels  of  gramma  ticality,  g  ^>g  ^>g  q  and  an 
unambiguous  grammar  G  generating  L  has  in  its  production  set 
three  rules  p^,p2,p^  corresponding  to  these 
gramma ticalities.  Suppose  also  that  p  ^  and  p  ^  together 
generate  a  string  s.  The  grammars  G  ?  ,  G  ^  created  for  g  0  ,  g  ^ 
respectively,  could  very  well  both  contain  p^  and  And  in 

the  final  fuzzy  union  this  would  create  two  derivation  paths 
for  s  . 

5.2  "Very  Approximate"  Learning  Criteria 

Chapters  Two  and  Three  reviewed  the  various  ways  that 
have  been  suggested  to  permit  a  language  learner  to 
hypothesize  languages  that  are  almost,  but  not  quite,  the 
same  as  the  target  language.  In  each  case  this  simplified 
the  task  and  permitted  the  learning  of  larger  classes  of 
languages.  This  is  desirable  due  to  the  limitations  noted 
for  exact  limiting  criteria.  Just  as  there  is  a  need  for 
theories  of  fuzzy  or  approximate  deductive  reasoning 
<c f  .  Za deh ,  1  9  7 7 >  ,  so  too  is  there  a  need  for  theories  of 
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fuzzy  inductive  reasoning. 

5.2.1  Order-Identification 

The  approximation  of  a  (possibly  fuzzy)  language  by 
fuzzy  grammars,  in  the  manner  stated  at  the  beginning  of 
section  5.1.1  as  the  goal  of  Tamura  and  Tanaka,  appears  to 
provide  a  promising  new  limiting  criterion  for  approximate 
learning.  This  is  an  illusion  however.  It  confers  no 
advantages  over  the  standard  exact  limiting  criteria  even 
for  non-fuzzy  languages,  as  is  apparent  from  the  next 
theorem. 

P  ef  i  ni  t  ion  :  An  inductive  inference  machine  M  orde  r- 
ma  t  ch  es  a  fuzzy  language  L  in  the  limit  if  : 

1)  M's  hypothesis  space  contains  only  fuzzy  Type  1 
gramma  rs 

2)  for  every  text  [informant]  of  L  3  M  such  that  if 

[mI(s1)  >  mI(s2>]  then  [m^CSj)  >  mpn(s2)]  V  n>N, 

where  (P^)  is  the  sequence  of  M's  hypotheses. 

3)  (P^base)  stabilizes 

TP  FOP  FM  3_  If  an  inductive  inference  machine  M  order-natches 
a  class  C  of  (non-fuzzy)  languages  given  text  then  3» 
uniformly  in  M,  a  machine  M'  that  matches  C  in  the  limit. 
Proof :  Suppose  M  order-matches  L.  Let  (P ^  )  be  the  sequence 
of  M's  hypotheses  given  a  text  for  L.  Call  the  nth  partial 
text  L ^ .  Define  M'  by  the  following  program  description. 


5.2.1  Order-Identification 


1  19 


Fo  r  L  : 
n 

For  all  (s,l)  G  L^,  compute  all  parses  of  s,  and 

thereby  the  membership  assigned  to  s  by  H^.  Output 

R  '  defined  to  be  the  1-f uzzif ication  of  H  lambda 
n  n 

where  lambda=the  minimum  value  obtained  from  the 

above  membership  calculations. 

Vn ,  let  lambda^  be  the  least  gramma  t  icali  ty  assigned  by  II 

to  any  member  of  L.  Since  M  o  r  de  r  -ma  t  cb  es  L,  3  a  partial 

text  number  N  and  a  base  grammar  BG  such  that  Vn>N 

Hnbase=BG,  and  all  strings  not  in  L  have  memberships  < 

lambda  in  L(H  ). 
n  n 

Cl  a  in  :  The  value  of  lambda  used  by  M'  to  compute  H  '  is 
lambda^  for  n  sufficiently  large. 

For  any  derivation  d  of  a  string,  let  r^  denote  the  set  of 

rules  used  in  d.  For  any  sG  L,  let  P.  denote  the  group  of 

s 

rule  sets  { r  :  d  is  a  derivation  of  s  using  grammar  BG  >  ; 

i.e.  F  contains  every  set  of  rules  in  BG  that  can  be  used 
s 

for  deriving  s.  Finally,  let  R  be  the  collection  of  all 

Li 

such  groups  of  rule  sets  for  strings  in  L;  i.e.  R  ={R  :  sG 

Lj  S 

L }  .  R  is  finite  since  the  set  of  rules  in  BG  is  finite. 

Li 

Hence  3  some  finite  set  of  strings  S  C  L  such  that  R={F  :sG 

L  s 

S).  The  strings  in  S  all  appear  in  for  n  sufficiently 

large,  say  n>N'>N.  Hence  Vn>N',  and  every  tG  L,  some  s 

appears  in  L  with  R  =R  .  Consequently,  if  t  receives  the 

n  t  s 

minimal  gr  amma  t  i  ca  1  i  ty  for  L,  so  does  s,  and  therefore  the 
lambda  used  by  M'  to  compute  H  '  is  lambdan  as  required.// 
Tamura  and  Tanaka  apparently  wished  to  assign 
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gramma ticalities  to  the  production  rules  of  an  essentially 
stable  Type  1  base  grammar  in  such  a  way  as  to  force  the 
membership  values  of  the  resultant  languages  to  approach  the 
target  language's  values  in  the  limit.  They  were  concerned 
only  with  text  sample  presentations  since  the  class  of 
languages  obtainable  from  Type  1  grammars  is  already 
identifiable  in  the  limit  given  informant  (since  they  form  a 
r.e.  class  of  recursive  languages).  In  short,  their  hope 
presumably  was  to  enhance  the  currently  rather  dismal 
performance  of  language  learners  given  text.  However,  a 
machine  embodying  this  goal  would  order-match  the  target 
language  and  so,  by  the  last  theorem,  could  be  replaced  by 
an  (exact)  matching  algorithm.  Consequently  such  a  machine 
would  still  be  extremely  limited  in  its  power  with  respect 
to  text,  as  the  results  cited  in  section  3.5  demonstrate. 

5.2.2  E  and  E  -identification 

range 

Conceptually,  fuzzy  languages  seem  to  demand  a  notion 
of  equality  that  permits  an  infinite  number  of  differences 
between  target  and  hypothesis  as  long  as  the  overall 
proportion  is  not  "too  large".  In  another  context, 
Tsichritzis  < 1 9  7 1  >  notes  that  such  "fuzzy"^  functional 


This  term  receives  various  interpretations.  Tsichritizis 
<19  7 1>  and  Santos  <  19  7  4>  ,  for  example,  use  it  essentially  to 
indicate  an  assignment  of  coefficients  free  from  the 
constraints  of  the  axioms  of  probability.  The  term  is  used 
only  informally  here,  with  any  technical  usage  being 
reserved  for  situations  deriving  more  obviously  from  Zadeh's 
max-min  membership  definitions. 
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range 

approximations  can  significantly  simplify  many  problems.  The 
expectation  that  this  should  be  true  for  language  learning, 
is  strengthened  by  the  results  cited  in  Chapters  2  and  3. 
Consequently  two  criteria  of  such  "very  approximate" 
learning,  E  and  E  -identification  are  proposed  below. 

In  the  interests  of  simplicity,  the  following  analysis 
is  given  initially  in  terms  of  functions  rather  than 
languages,  with  the  implications  for  language  learning 
discussed  later  in  the  chapter.  The  following  definitions 
apply  only  to  total  functions. 

D  ef  ini t ion :  Given  two  functions  f  and  g,  D  IF  (f , g, n )  ={x: 

f(x)^g(x)  and  x  _<  n  }  . 

Definition:  Given  two  functions  f  and  g,  DENSDIF(f,g)  = 
lin^  sup  //D  IF  (f  ,  g  ,  n )  #/n 

Definition:  A  function  f  is  an  E-variant  of  a  function 
-  v  —  - 

f  if  DENSD  IF  (f  ,  f  )  <_  E  . 

Definition:  An  inductive  inference  machine  M  E_- 

ident  if  ies  C  0  <_  E  <_  1 )  a  function  f  in  the  limit  if  for 

every  enumeration  of  f,  3  i  such  that  M  converges  to  i 

and  t.  is  an  E-variant  of  f. 
l 

Definition :  E-ID  is  the  class  of  E-identifiable  sets  of 
total  recursive  functions. 

Rema  rk  s 

1)  E-variants  of  E-variants  of  a  function  f  are  not 
necessarily  E-variants  of  f,  for  E  €  (0,1);  however,  0- 

variants  of  0-variants  of  f  are  0-variants  of  f. 
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This  suggests  that  O-variants  are  better  behaved  than  E- 
variants  in  general,  and  so  should  be  stressed. 

2)  If  a  total  function  f  almost  everywhere  equals  a 
total  function  g  then  f  is  a  0-variant  of  g. 

3)  O-variants  of  a  total  function  f  are  not  necessarily 

almost  everywhere  equal  f.  For  example,  given  f,  define 

the  function  f  by: 

v  J 

f  (x)  =  f  (x  )  if  x  ^  2n  ,  nG  N 

f(x)+l  otherwise 

4)  Whereas  finite  variants  of  recursive  functions  are 
again  recursive  functions,  O-variants  of  recursive 
functions  are  not  necessarily  recursive.  And  whereas  the 
finite  variants  of  a  total  recursive  function  are 
recursively  enumerable, 

Proposition  Given  any  total  recursive  function  f  the  set  of 

total  recursive  O-variants  is  not  recursively  enumerable. 

Proof:  By  contradiction.  Let  f  .,f  »,...  be  some  such 

effective  listing.  Given  any  total  recursive  function  r,  3 

f  .  such  that: 
vj 

f  .  (x  )  =  r  (k  )  for  x=2^,  k  =  0,l,2,... 

vjv 

f  (x  )  o  th  erw  is  e 

since  such  a  function  is  a  total  recursive  0-variant  of  f  by 
construction.  Define  the  new  sequence  of  functions  n  by: 


' 
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n  .  (x  )  =f  .  (2X) 
i  vi 

By  construction  then,  (n^)  is  an  effective  listing  of  R. 

This  contradicts  the  well-known  fact  that  R  is  not 
recursively  enumerable.// 

THEO P EM  _4  0-ID  strictly  includes  ID*. 

Proof :  Containment  is  immediate  from  the  previous  remarks. 
Let  C  be  a  singleton  set  containing  one  (arbitrary)  total 
recursive  function  f.  Define  C '  as  follows. 

C '  =  { f  :  f  r  ( x  )  =  r(k)  for  x  =  2  ^  ,  k  =  0,l,2,... 

f(x)  otherwise 
where  f  6  C,  r  G  R). 

Intuitively,  C'  contains  all  the  total  recursive  functions 
obtainable  from  f  by  inserting  the  values  of  other  recursive 
functions  at  intervals  of  exponentially  growing  length.  This 
construction  is  not  effective,  there  being  no  effective 
listing  of  R,  but  this  does  not  matter. 

By  construction,  C'  is  a  set  of  total  recursive  O-variants 
of  f. 

C'  is  trivially  0-i de nt if iable  by  the  inductive  inference 
machine  that  always  returns  an  index  for  f. 

Suppose  C '  is  almost  everywhere  (*)  identifiable  by  some 
machine  M.  Define  the  new  machine  M'  by  the  following 
program  description: 

Given  g  (wnlg  assume  increasing  enumerations): 
n 

Define  SpecialEnum  =  (f(0),  g  ( 0)  ,  g(l),  f(3),  g(2), 

f  (  5)  ,  f  (6  )  ,  f  (  7)  ,  g  (  3 )  »  f(9),  ...,  g  (n  )  ) 

Intuitively,  if  g  G  R  then  SpecialEnum  is  a  partial 
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enumeration  of  f^G  C'  •  Its  definition  is  clearly 

uniform  in  f  and  g  . 

n 

Let  M (  [SpecialEnum] )=i , 

and  define  tj=lambda  x[t^(2X)]. 

Output  j  . 

Intuitively,  t ^  is  a  program  for  a  finite  variant  of  g 
whenever  (if  ever)  t ^  is  a  program  for  a  finite  variant  of 
f  .  Consequently  M'  almost  everywhere  ( * )  identifies  R.  This 

O 

contradicts  the  previously  cited  results  that  ID*  is 
included  in  MATCH  which  is  strictly  included  in  R.  Hence  C' 
~GID*  and  so  0-IP  strictly  includes  ID*.// 

P.  e  ma  rk  :  The  same  argument  works  for  the  corresponding 
definition  of  O-matching  and  MATCH*. 

THEO  P  EM  _5  For  every  hG  R,  €  >0,  0<_E<  1  ,  3  M ,  uniformly  in  h, 
such  that  M  reliably^-  E  +  G  -identifies  the  class  of  E- 
variants  of  the  h-easy  functions. 

Proof:  Wnlg  we  assume  increasing  enumerations.  The  proof 
proceeds  via  two  lemmas. 

L  emma 1  For  every  he  F,  3  M,  uniformly  in  h,  such 
that  for  all  g  [3f  such  that  (f  is  h-easy)  and 
(DIF  (f  ,  g  ,  n  ) /n  jc  E+G  Vn  ]  implies  (M  reliably  E  +  G 
identifies  g) 

Proof :  Define  M  by  the  following  program  description. 

Set  FLAG  =  FAILUFE,  and  begin  enumerating  NxN . 


^  The  definition  for  E- i de n t i f i ca t i on  is  analogous  to  that 
for  almost  everywhere  identification. 


5.2.2  E  and  E 


-identif ication 


125 


ra  nge 


If  FLAG  =  FAILUPE,  then: 

Find  the  next  pair  (i,n)  in  the  enumeration  of 
NxN  .  Output  i.  For  all  x  such  that  (x,g(x))  € 
g^,  check  whether  or  not  T^(x)  _<  max  {m ,  h  (x  )  >  . 

If  the  check  is  satisfied,  then  check  whether  or 
not,  for  these  x,  //{x  :  g(x)  4  t  .  (x)>#  /  n  _<  E+G 
.  If  either  of  these  checks  fail,  set  FLAG  to 
FAILURE;  otherwise  set  FLAG  to  SUCCESS. 

Else  if  FLAG  =  SUCCESS  then: 

output  the  index  hypothesized  for  g^  and  do 

the  checks  and  flag  assignments  as  described 
above  for  this  old  hypothesis.  // 

Intuitively,  the  partial  recursive  functions  are 
being  enumerated  and  checked  as  to  whether  or  not 
they  are  "almost  compatible"  with  the  target.  The 
complexity  check  using  h  ensures  that  the  inductive 
inference  machine  knows  when  to  stop  calculating  with 
any  particular  partial  recursive  function.  The  second 
element,  m,  of  the  enumerated  pairs  permits  functions 
to  have  complexities  that  are  only  almost  everywhere, 
rather  than  everywhere,  bounded  by  h. 

L  emma  2  For  any  class  C  of  functions,  if  M  reliably  E- 
identifies  C  then  3  M' ,  uniformly  in  M,  such  that  M' 
E-identifies  C'  =  the  class  of  finite  variants  of 


functions  in  C 


, 
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Proof :  Let  S  =  (  S ^  ,  S ^ » S ^ »  •  •  •  )  be  an  effective 

enuneration  of  all  finite  sequences  of  "functional 
pairs"  (x ,  y )  G  NxN  ,  such  that  for  (x  .  ,y.  )  ,  (x_.,yj) 

[Xi=x.l  =>  [yi=yj]  • 

For  any  finite  functional  pair  sequence  S,  and 

partial  enumeration  f  ,  define  S*f  to  be  S 
r  n  n 

"concatenated"  with  the  partial  enumeration  in  the 

sense  that  [(x.,y.)  G  S*f  ]  iff  [(x.,y.)  G  S  or 

i  *  J  l  n  l  *  J  i 

(x  #y  )  G  f^  and  x^  >  max  {x  :  (x,y)  G  S}]  . 

Intuitively,  S  is  just  an  initial  "trial  sequence" 

followed  by  the  inputted  partial  enumeration  that  has 

been  doctored  so  as  not  to  contradict  any  of  the 

trial  sequence  pairs  (i.e.  to  preserve  the  functional 

character  of  the  enumeration).  Let  f  be  a  finite 

v 

variant  of  a  function  f  G  C. 

On  f  •  c  (  1 )  :  =  1 
v  1 

M([S^*f  j])  is  returned. 

0  n  f  : 
vn 

If  M([S  f  ,  v  *  f  ,  ]  )  =  M  (  [  S  (  .  *  f  ])  then 

c(n-l)  vn-1  c(n-l)  vn 

COMMENT:  M  appears  to  be  stabilizing  so  perhaps 

the  current  trial  sequence  is  one  that  alters 

the  enumeration  of  f  to  that  of  some  f  that  M 

v 

can  identify, 
c  (n  )  :  =c  (n -  1 ) 

the  common  output  is  returned. 


otherwise: 
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COMMENT:  M  is  not  stabilizing  so  things  must 
be  arranged  to  try  a  new  trial  sequence  for 
the  next  partial  enumeration,  and  a  result 
must  be  output  that  is  unquestionably 
different  from  the  previous  output  (to  ensure 
reliability ). 

If  c  ( n -  1 )  =  1  then 
c  (n  )  :  =  n 

the  previous  output+1  is  returned. 

Otherwise 

c  (n)  :=c  (n-l)-l 

the  previous  output+1  is  returned.// 

The  method  of  this  proof  is  essentially  that  given 

for  an  analogous  result  for  almost  everywhere 

identification  by  Minicozzi  <  1 9  7  6  >  .  Intuitively,  the 

initial  portions  of  the  enumeration  of  f ^  are 

replaced  by  increasingly  long  trial  sequences,  and 

the  altered  partial  enumerations  of  f  are  fed  to  M. 

The  goal  is  to  stumble  upon  a  trial  sequence  that 

alters  the  enumeration  of  f  to  that  of  some  f  €  C. 

v 

Its  achievement  is  detected  by  M's  stabilization, 
first  suspected  by  M's  agreement  upon  two  consecutive 
partial  enumerations. 

The  proof  of  Theorem  5  now  follows  by  noting  that  given 
a  function  f,  for  any  E-variant  f  and  arbitrary  £  >0, 
3  g  such  that  [  C  g  is  a  finite  variant  of  f  )  and  Vn 
DIF  (f  ,g,n)  £  E  +  e  ]  .  // 
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The  notion  of  E-variants  employed  thus  far  has  a 
feature  that  nay  or  may  not  be  acceptable,  depending  upon 
the  situation  and  the  reader's  inclinations.  It  is  this:  the 
discrepancies  between  a  function  and  an  E- variant  may  be 
bounded  satisfactorily  in  an  overall  sense,  while  being 
overwhelming  for  some  particular  range  value.  For  example, 
it  is  possible  for  a  0-1  valued  0-variant  f  of  a  0-1  valued 
function  f  to  be  "wrong"  at  every  point  where  f  assumes  the 
value  1,  if  only  f(x)=l  implies  x=2  ,  k=0,l,2,...  .  This  may 

seem  appropriate  since  f  is,  in  a  sense,  close  to  the  almost 
everywhere  0  function  f^.  On  the  other  hand,  viewing  f  as  a 
characteristic  function,  it  can  be  argued  that  variants 
should  allow  neither  too  many  additions  to  the  set  nor  too 
many  omissions.  E-variants  simply  bound  the  proportion  of 
additions  and  omissions.  Just  as  statistics  distinguishes 
between  Type  0  and  Type  1  errors,  perhaps  here  also  each 
kind  of  error  (i . e. inclusions ,  ommissions)  should  be 
separately  bounded. 

The  following  discussion  is  again  in  terms  of  total 
functions. 

Definition:  DIFr(f,g,n)  =  {x:  r  =  f(x)^g(x)  for  x_<  n } 
Definition:  DENSDIF^  =  lim^  sup  #D  IF  (f  ,  g  ,  n  )  /// P*  //  {x  : 
f(x)=r  for  x_<n}//  +  (1-P)*n,  where  P  is  the  predicate: 
f (n ) =r . 

Definition:  A  function  f._  is  an  E _ -variant  of  a 

-  v  range 

function  f  if  sup  D EN SD IFr  (f  ,  f^  )  <_  Erange  where  max  is 

over  all  r6  Range(f). 
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The  definitions  for  E  -identification  and  E  -ID 

range  range 

follow  those  given  in  terms  of  E-variants,  substituting 

F  f  o  r  E  . 

range 

A  development  very  similar  to  that  for  E-variants  seems 
possible.  There  are  corresponding  versions  of  both  Theorem  4 
and  5  . 


THEOREM  6  0  -ID  strictlv  includes  IDx  . 

-  — range  -  * 

Proof :  By  analogy  with  the  proof  of  Theorem  4.  Let  C  contain 
a  single  recursive  characteristic  function  c.  Insert  the 
values  of  recursive  functions  r  at  the  2  th  points  where 
c(x)=l,  and  the  2  th  points  where  c(x)=0,  for  k=0,l,2,...  . 

Pad  the  given  values  of  accordingly.// 

The  calculation  of  DENSDIF^  is  materially  affected  by 
whether  or  not  the  range  of  a  function  is  finite.  The 
infinite  case  appears  to  pose  many  new  problems.  Since  non- 
fuzzy  and  fuzzy  (grammar)  languages  correspond  to  functions 
with  finite  range,  Theorem  7  will  be  stated  in  terms  of 
characteristic  functions  (the  extension  to  finite  valued 
membership  functions  is  obvious). 

THEOREM  7  For  every  h6  R,  €  >0,  0_<Era  e<l,  3  M ,  uniformly 

in  h,  such  that  M  reliably  E+e  -identifies  the  class  of 

*  range 

E  -variants  of  the  h-easy  characteristic  functions, 

range 

Proof :  By  analogy  with  Theorem  5.  In  lemma  1  the  checks 
performed  after  the  complexity  checks,  are  altered  to:  If 
g(n)=0  then  check  whether  //DIFg(ti,g,n)#///<x:  ti(x)=0  for 

x£n>//  £  E+e  AND  DIF^(t^,g,n)/n  <.  ( E+G  ).  Otherwise,  if 
g(n)=l  then  do  the  checks  obviously  corresponding  to  those 


. 
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just  listed.// 

How  these  two  notions  of  fuzzy  variants  are  related  is 
stated  in  the  next  proposition.  As  might  be  expected, 
bounding  the  number  of  discrepancies  for  each  member  of  the 
target's  range  results  in  the  overall  number  of 
discrepancies  being  bounded  also,  i.e. 

Proposition  Given  that  P.ange(f)  is  finite,  f  is  an  E 
- 11 -  v  range 

variant  of  f  implies  that  f  is  an  E-variant  of  f. 

Proof  : 

n_  n_ 

L  emma  >_m  .  /  >_n  .  <  max  ( m  .  /  n  .  )  ,  wh  erem.  <  n  .  4  0  ,  and 
i=1  1  i=1  1  l<i£n  11  ii 

mi *  ni  €  N • 

Proof:  By  induction  on  n. 

The  lemma  is  trivially  true  for  n=l. 

Suppose  n=k. 

First  of  all, 

( a  +b  )  /(c  +d )  _<  b/d  if  a/c  <  b/d,  for  a ,  b  ,  c  ,  d  e  N ;  c,d  4 


0  . 

k  k  k-1  k— 1 

Zm_^  /Zn  =  (Sm  +M)  /  (Zn^+N) 

where  F/N  =  max  (m./n.) 

14  4k  1  1 

k-1  krl  . 

By  the  inductive  assumption  Znu/>_n^  £  M/N. 

Therefore,  by  the  introductory  observation, 
k-1  k-1 

(Zmi+M)/(>n.+N)  £  M/N. 


i.e.  Zm. /in.  <  max  (m,/n.).// 
i  i  -  1  i 

Let  M  =D IF  (f , f  ,  n) 
nr  r  *  v  * 

and  N  ={x  :  f(x)=r  and  x  £  n). 

n  r 

Then  DENSDIF(f  ,fv)  =  limn  sup  ^nr/g„)  • 

and  max  DENSDIF  (f,f  )=lim  sup  max  (M  /N  ) 

IT  v  n  r  c.  R  nr  nr 
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So  to  establish  the  proposition  it  suffices  to  show  that: 

>_F  />_N  <  max  (M  /N  )  Vn . 

r€R  nr  reR  nr  "  reR  nr  nr 

But  this,  given  that  Range(f)  is  finite,  is  precisely  what 
the  lemma  shows.// 

The  discussion  thus  far  in  this  subsection,  has  been  in 
terms  of  functions  and  functional  learning.  However,  since 
it  has  dealt  with  total  recursive  functions,  the  translation 
into  a  linguistic  context  is  relatively  easy  folio wing  the 
analysis  given  in  section  3.1.  For  languages  with  recursive 
membership  functions  (and  this  includes  most  of  the  commonly 
used  types),  the  fuzzy  models  of  identification  presented 
permit  the  learning,  given  informant,  of  languages  by 
approximating  them  with  (fuzzy)  languages  that,  while 
infinitely  different,  are  sufficiently  similar. 

There  are  two  seemingly  troublesome  points  with  this 
translation  for  these  "very  approximate"  models  of 
identification.  First,  the  results  are  very  enumeration 
dependent  and  some  enumeration  of  V  *,  corresponding  to  the 
increasing  enumeration  of  N,  must  be  specified.  The  standard 
lexicographical  order  seems  a  reasonable  choice  here.  The 
second  point  is  that  the  functions  corresponding  to  the 
(non-fuzzy)  and  fuzzy  (grammar)  languages  have  finite 
ranges,  yet  since  an  infinite  number  of  discrepancies 
between  target  and  hypothesis  are  allowed  by  the  previous 
"very  approximate"  learning  criteria,  the  functions 


hypothesized  may  no  longer  have  finite  ranges.  This  is  a 
more  serious  difficulty  than  the  enumeration  dependence,  but 


/ 
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can  be  overcome  by  enumerating  partial  recursive  functions 
with  finite  ranges  (these  form  a  "recursively  enumerable 
class"  <Poge r s ,  1 9  6 7> )  rather  than  P  wherever  appropriate. 
For  non-fuzzy  languages  0-1  valued  partial  recursive 
functions  must  be  enumerated. 

More  precisely,  in  terms  of  fuzzy  (grammar)  recursive 
languages,  the  previous  definitions  can  be  altered  as 
follows. 

Def inition:  Given  two  fuzzy  languages  L  and  H  (assumed 
wnlg  to  share  terminal  vocabulary  V  ),  DIFr(L,H,n)  =  {s: 
r =m^ ( s ) ^mp ( s )  and  s  is  one  of  the  first  n  strings  in  the 
lexicographical  ordering  of  V  *}. 

Definition:  A  language  L  is  an  E  -variant  of  a 

-  v  range 

language  L  if  max  DFNSDIF  (mT  ,  mT  )  <  F  ,  for  rfi 

Fange (m^ ) . 

The  other  definitions  can  be  similarly  altered. 

Theorems  6  and  7  can  then  be  reformulated  as  follows. 
COROLLA RY  to  THEOREM  6 

The  class  of  sets  of  recursive  languages  that  can  be 

0  -identified  given  informant,  strictly  includes 

range  c 

that  which  can  be  almost  everywhere  identified. 

Call  a  recursive  language  with  an  h-easy  membership 
function  an  "h-easy  language". 

COROLLARY  to  THEOREM  ]_ 

V  h  e  R  ,  €  >  0  ,  0<  E  r  a  n  g  e_<  1  ,  3  M,  uniformly  in  h,  such 

that  M  reliably  E+0  „  -identifies  the  class  of 

J  range 

E  -variants  of  the  h-easy  languages, 

range 
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Subjects  for  future  research  are:  *The  definition  of 
more  general  types  of  equivalence  of  functions  and  their  use 
in  defining  alternative  notions  of  "fuzzy"  identification. 
Briefly,  such  tests  might  permit  h(x)  to  be  within  some 
(specifiable)  neighborhood  of  t (x ) ,  for  hypothesis  h  and 
target  t.  Whereas  currently  it  is  the  proportion  of  points 
where  the  hypothesis  does  not  equal  (in  a  non-fuzzy  sense) 
the  target  function  that  determines  the  acceptability  of  the 
hypothesis,  technically  fuzzy  notions  of  point  equality 
appear  to  be  both  possible  and  desirable  here. 

*The  elimination  of  the  current  dependence  upon  the  standard 
enumeration  as  arbiter  in  determining  the  acceptable  error. 
This  might  be  done  by  generalizing  either  to  error  relative 
to  some  (arbitrary)  fixed  recursive  enumeration  of  domains, 
or  to  error  relative  to  the  particular  (arbitrary) 
enumeration  which  is  presented  to  the  inductive  inference 
ma  ch ine . 

In  both  cases,  the  modifications  required  to  get  results 
corresponding  to  Theorems  6  and  7  are  likely  to  be  minor. 
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